The AI Productivity Index for Agents

The AI Productivity Index for Agents (APEX-Agents) measures whether frontier AI agents can execute long-horizon, cross-application tasks across three jobs in professional services.

The APEX-Agents leaderboard

We created APEX-Agents to evaluate agents on the real day-to-day work of professionals: investment banking analysts, management consultants, and corporate lawyers. The tasks require agents to reason, demonstrate advanced knowledge, use multiple applications, and plan over long horizons.

APEX-Agents was built in three steps. First, industry professionals created a data-rich world over 5-10 days, based on a unique project scenario. Second, they created realistic, challenging tasks using the files from within the world. Third, we gave agents access so they could execute the tasks (with all of the software that a human would use).


There are 33 worlds in APEX-agents, comprising 480 tasks and grading rubrics. The entire APEX-agents dataset is available open-source, along with Archipelago, our infra service for executing and evaluating agent trajectories.

Jobs evaluated in APEX-Agents

Drafts and reviews contracts, conducts legal research, and advises clients on regulatory and transactional matters. Collaborates with partners on litigation, mergers and acquisitions, and compliance while managing heavy workloads across cases.

Experts from Latham & Watkins, Skadden, Cravath

galapagos-alpha-xhigh

GPT 5.4 (xHigh)

29.8%

gpt-5.5-xhigh

GPT 5.5 (xHigh)

29.3%

claude-opus-4-6-max

Opus 4.6 (Max)

26.5%

Analyzes industries, evaluates markets, and builds strategic or financial models to guide client decisions. Work often includes preparing presentations, drafting reports, and synthesizing research into actionable recommendations.

Experts from McKinsey, BCG, Deloitte, Accenture, EY

gpt-5.5-xhigh

GPT 5.5 (xHigh)

44.1%

gpt-5.2-xhigh

GPT 5.2 (xHigh)

42.0%

galapagos-alpha-xhigh

GPT 5.4 (xHigh)

41.3%

Builds financial models, values companies, and prepares pitch materials for potential deals. Responsibilities include conducting industry research, supporting transaction execution, and producing client-ready presentations under tight deadlines.

Experts from Goldman Sachs, Morgan Stanley, JPMorgan, Barclays

gpt-5.5-xhigh

GPT 5.5 (xHigh)

41.7%

claude-opus-4-7

Opus 4.7 (Max)

37.2%

gpt-5.2-xhigh

GPT 5.2 (xHigh)

37.1%

APEX NEWSLETTER

The latest on frontier AI performance, straight to your inbox.

New benchmarks, leaderboard shifts, and research from the APEX team.

By subscribing you agree to receive updates from Mercor.
Unsubscribe anytime.