The AI Productivity Index for Agents

The AI Productivity Index for Agents (APEX-Agents) measures whether frontier AI agents can execute long-horizon, cross-application tasks across three jobs in professional services.

Get in touch Read Research

The APEX-Agents leaderboard

We created APEX-Agents to evaluate agents on the real day-to-day work of professionals: investment banking analysts, management consultants, and corporate lawyers. The tasks require agents to reason, demonstrate advanced knowledge, use multiple applications, and plan over long horizons.

APEX-Agents was built in three steps. First, industry professionals created a data-rich world over 5-10 days, based on a unique project scenario. Second, they created realistic, challenging tasks using the files from within the world. Third, we gave agents access so they could execute the tasks (with all of the software that a human would use).

There are 33 worlds in APEX-agents, comprising 480 tasks and grading rubrics. The entire APEX-agents dataset is available open-source, along with Archipelago, our infra service for executing and evaluating agent trajectories.

Blog Paper Data Code Sample Task

Model

Score