APEX

The AI Productivity Index (APEX) assesses whether frontier models are capable of performing economically valuable tasks across four jobs: investment banking associate, management consultant, big law associate, and primary care physician (MD).

Get in touch Read Research

The APEX leaderboard

We created APEX to bridge the gap between what professionals want from AI systems and what benchmarks test for. The prompts are realistic, challenging and diverse, and are provided with source documents. Each case was created by a veteran industry expert to capture their day-to-day work.

The leaderboard is based on a hidden heldout set of 400 tasks (n=100 per job). For each task we collect responses from each model 8 times. We grade them using a Judge LM and report the mean value.

To support open research, we have open-sourced n=100 cases that are in-distribution of APEX on Hugging Face, and our eval harness.

Blog Paper Data Code Sample Task

Model

Score

GPT 5 (High)

67.0% ± 2.4%

GPT 5.2 Pro (High)

66.8% ± 2.6%

Gemini 3 Pro (High)

64.3% ± 2.3%

Gemini 3 Flash (High)

64.0% ± 2.2%

Grok 4

63.5% ± 2.5%

40%

50%

60%

70%

80%

90%

Jobs covered in APEX

Big Law Associate

Drafts and reviews contracts, conducts legal research, and advises clients on regulatory and transactional matters. Collaborates with partners on litigation, mergers and acquisitions, and compliance while managing heavy workloads across cases.

Advised by Cass Sunstein—Harvard law professor, former White House Regulatory Administrator, and top-cited legal scholar.

Experts from Latham & Watkins, Skadden, Cravath.

78%

77%

76%

Analyzes industries, evaluates markets, and builds strategic or financial models to guide client decisions. Work often includes preparing presentations, drafting reports, and synthesizing research into actionable recommendations.

Advised by Dominic Barton—former McKinsey Global Managing Director and Canadian Ambassador to China.

Experts from McKinsey, BCG, Deloitte, Accenture, EY.

Gemini 3 Pro (High)

64%

Gemini 3 Flash (High)

64%

GPT 5.2 Pro (High)

64%

General Practitioner (MD)

Diagnoses and treats a wide range of patient conditions, from acute illnesses to chronic diseases. Reviews medical histories, orders and interprets tests, prescribes treatments, and provides preventative care and ongoing patient guidance.

Advised by Eric Topol—Cardiologist, geneticist, and founder of the Scripps Research Translational Institute, leading voice in digital and precision medicine.

Experts from University of Pennsylvania, Northwestern, Cornell, Brigham & Women’s, Mount Sinai.

GPT 5 (High)

66%

Opus 4.5 (On)

65%

GPT 5.2 Pro (High)

65%

Investment Banking Analyst

Builds financial models, values companies, and prepares pitch materials for potential deals. Responsibilities include conducting industry research, supporting transaction execution, and producing client-ready presentations under tight deadlines.

Experts from Goldman Sachs, Morgan Stanley, JPMorgan, Barclays, UBS, Bank of America, Evercore.

GPT 5.2 Pro (High)

64%

Gemini 3 Pro (High)

63%

GPT 5 (High)

61%