Diverse datasets spanning professional, coding, research, agentic, and multimodal tasks.
No waiting, the same datasets used by the world's top AI labs, ready to license immediately.
Every dataset is pre-built, peer-reviewed, and ready to license. Sample tasks delivered same day. No custom pipeline, no 6-month wait.
Every task is written by PhD researchers, practicing lawyers, and senior engineers, not crowdsourced.
Each task is created, vetted, and peer reviewed by domain experts and using quality automation to ensure training signal.
Every dataset is built from tasks created and evaluated by our expert network.
Mercor's AI Productivity Index for Agents. Expert-built tasks run inside high-fidelity enterprise app clones, testing whether agents can navigate hundreds of files, hold context, and finish long-horizon professional work.
Domains
Featured Datasets
Mercor's AI Consumer Index. The first benchmark for everyday consumer tasks across shopping, food, gaming, and DIY, penalizing models that hallucinate prices, specs, or links instead of verifying them.
The AI Productivity Index. Rubric-graded tasks test whether models can do the real knowledge work of investment bankers, consultants, lawyers, and physicians.
A web-browsing benchmark for realistic, end-to-end search. Tasks test whether models can strategize, navigate authoritative sources, and synthesize grounded answers to questions general knowledge can't solve.
Sample tasks from any dataset, delivered same day.