The AI Productivity Index for Agents

The AI Productivity Index for Agents (APEX-Agents) measures whether frontier AI agents can execute long-horizon, cross-application tasks across three jobs in professional services.

Get in touch Read research

The APEX-Agents leaderboard

We created APEX-Agents to evaluate agents on the real day-to-day work of professionals: investment banking analysts, management consultants, and corporate lawyers. The tasks require agents to reason, demonstrate advanced knowledge, use multiple applications, and plan over long horizons.

APEX-Agents was built in three steps. First, industry professionals created a data-rich world over 5-10 days, based on a unique project scenario. Second, they created realistic, challenging tasks using the files from within the world. Third, we gave agents access so they could execute the tasks (with all of the software that a human would use).

There are 33 worlds in APEX-agents, comprising 480 tasks and grading rubrics. The entire APEX-agents dataset is available open-source, along with Archipelago, our infra service for executing and evaluating agent trajectories.

Blog Paper Data Code Sample task

Model

Score

GPT 5.5 (xHigh)

38.4% ± 3.9%

GPT 5.4 (xHigh)

36.0% ± 3.8%

GPT 5.2 (xHigh)

34.4% ± 3.8%

Opus 4.7 (Max)

33.9% ± 3.8%

Gemini 3.1 Pro (High)

33.5% ± 3.6%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Jobs evaluated in APEX-Agents

Corporate Lawyer

Drafts and reviews contracts, conducts legal research, and advises clients on regulatory and transactional matters. Collaborates with partners on litigation, mergers and acquisitions, and compliance while managing heavy workloads across cases.

Experts from Latham & Watkins, Skadden, Cravath

GPT 5.4 (xHigh)

29.8%

GPT 5.5 (xHigh)

29.3%

Opus 4.6 (Max)

26.5%

Management Consultant

Analyzes industries, evaluates markets, and builds strategic or financial models to guide client decisions. Work often includes preparing presentations, drafting reports, and synthesizing research into actionable recommendations.

Experts from McKinsey, BCG, Deloitte, Accenture, EY

GPT 5.5 (xHigh)

44.1%

GPT 5.2 (xHigh)

42.0%

GPT 5.4 (xHigh)

41.3%

Investment Banking Analyst

Builds financial models, values companies, and prepares pitch materials for potential deals. Responsibilities include conducting industry research, supporting transaction execution, and producing client-ready presentations under tight deadlines.

Experts from Goldman Sachs, Morgan Stanley, JPMorgan, Barclays

GPT 5.5 (xHigh)

41.7%

Opus 4.7 (Max)

37.2%

GPT 5.2 (xHigh)

37.1%

APEX NEWSLETTER

The latest on frontier AI performance, straight to your inbox.

New benchmarks, leaderboard shifts, and research from the APEX team.