Back to APEX Family

The AI Productivity Index

The AI Productivity Index (APEX) assesses whether frontier models are capable of performing economically valuable tasks across four jobs: investment banking associate, management consultant, big law associate, and primary care physician (MD).

The APEX leaderboard

We created APEX to bridge the gap between what professionals want from AI systems and what benchmarks test for. The prompts are realistic, challenging and diverse, and are provided with source documents. Each case was created by a veteran industry expert to capture their day-to-day work.

The leaderboard is based on a hidden heldout set of 400 tasks (n=100 per job). For each task we collect responses from each model 8 times. We grade them using a Judge LM and report the mean value.

To support open research, we have open-sourced n=100 cases that are in-distribution of APEX on Hugging Face. We have also shared our eval harness for reproducibility

Jobs covered in APEX

Consulting Associate
Analyzes industries, evaluates markets, and builds strategic or financial models to guide client decisions. Work often includes preparing presentations, drafting reports, and synthesizing research into actionable recommendations.

Advised by Dominic Barton—former McKinsey Global Managing Director and Canadian Ambassador to China.

Experts from McKinsey, BCG, Deloitte, Accenture, EY

gemini-3-pro

Gemini 3 Pro (High)

64%

gpt-5

GPT 5 (High)

63%

grok-4

Grok 4

60%

Investment Banking Analyst
Builds financial models, values companies, and prepares pitch materials for potential deals. Responsibilities include conducting industry research, supporting transaction execution, and producing client-ready presentations under tight deadlines.

Experts from Goldman Sachs, Morgan Stanley, JPMorgan, Barclays, UBS, Bank of America, Evercore

gemini-3-pro

Gemini 3 Pro (High)

63%

gpt-5

GPT 5 (High)

61%

grok-4

Grok 4

60%

Big Law Associate
Drafts and reviews contracts, conducts legal research, and advises clients on regulatory and transactional matters. Collaborates with partners on litigation, mergers and acquisitions, and compliance while managing heavy workloads across cases.

Advised by Cass Sunstein—Harvard law professor, former White House Regulatory Administrator, and top-cited legal scholar.

Experts from Latham & Watkins, Skadden, Cravath

gpt-5

GPT 5 (High)

78%

gpt-5-1

GPT 5.1 (High)

77%

o3

o3 (On)

76%

General Practitioner (MD)
Diagnoses and treats a wide range of patient conditions, from acute illnesses to chronic diseases. Reviews medical histories, orders and interprets tests, prescribes treatments, and provides preventative care and ongoing patient guidance.

Advised by Eric Topol—Cardiologist, geneticist, and founder of the Scripps Research Translational Institute, leading voice in digital and precision medicine.

Experts from University of Pennsylvania, Northwestern, Cornell, Brigham & Women’s, Mount Sinai

gpt-5

GPT 5 (High)

66%

claude-opus-4-5

Opus 4.5 (On)

65%

grok-4

Grok 4

64%