The AI Productivity Index (APEX) assesses whether frontier models are capable of performing economically valuable tasks across four jobs: investment banking associate, management consultant, big law associate, and primary care physician (MD).
The APEX leaderboard shows the ranking of frontier models measured against a hidden heldout set of 400 tasks (n=100 per job).
View more
GPT 5 (High)
67% ± 2.4%
GPT 5.2 Pro (High)
66.8% ± 2.6%
Gemini 3 Pro (High)
64.3% ± 2.3%
The ACE leaderboard shows the ranking of frontier models measured against the hidden held out eval set for ACE. Scores are models' average across the four activities.
View more
GPT 5 (High)
56.1% ± 3.3%
o3 Pro (High)
55.2% ± 3.2%
GPT 5.1 (High)
55.1% ± 3.2%