APEX Research Enterprise Blog

APEX Research Enterprise Blog

Back to APEX Family

The AI Consumer Index

The AI Consumer Index (ACE) assesses whether frontier AI models can perform everyday consumer tasks in shopping, food, gaming, and DIY.

Get in touch Read research

The ACE leaderboard

We created ACE because consumer applications are among the most widespread and fastest-growing uses of AI—yet existing benchmarks have paid too little attention. ACE evaluates whether AI models can help people with everyday activities such as finding products, enjoying their hobbies, fixing items at home, and cooking. It comprises a hidden held-out set of 400 prompts.

ACE tests frontier models with web search enabled, mimicking real-world consumer behavior. It uses a novel evaluation methodology that combines expert-crafted rubrics with hurdle criteria and grounding checks. Read the paper to learn more about our approach.

To support open research, we have open-sourced 80 in-distribution cases on Hugging Face, including all metadata labels. We have also released our evaluation harness for full reproducibility.

Blog Paper Data Code Sample task

Model

Score

GPT 5 (High)

GPT 5 (High)

56.1% ± 3.3%

o3 Pro (High)

o3 Pro (High)

55.2% ± 3.2%

GPT 5.1 (High)

GPT 5.1 (High)

55.1% ± 3.2%

o3 (High)

o3 (High)

52.9% ± 3.1%

GPT 5.2 (High)

GPT 5.2 (High)

51.5% ± 3.2%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Consumer activities in ACE

Tests AI models at planning home-improvement projects, recommending tools, estimating materials, and providing clear step-by-step instructions to complete tasks.

View more

gpt-5.1-high

gpt-5.1-high

GPT 5.1 (High)

55.8%

gpt-5-high

gpt-5-high

GPT 5 (High)

55.4%

o3-pro

o3-pro

o3 Pro (High)

54.2%

Tests AI models at recommending recipes, creating meal plans, handling dietary restrictions, and adapting meals for specific events or groups.

View more

gpt-5-high

gpt-5-high

GPT 5 (High)

70.1%

gpt-5.2-high

gpt-5.2-high

GPT 5.2 (High)

64.9%

o3-pro

o3-pro

o3 Pro (High)

60.2%

Tests AI models at recommending games, optimizing builds, providing strategy guidance, and configuring gaming setups.

View more

o3-pro

o3-pro

o3 Pro (High)

61.3%

gpt-5.1-high

gpt-5.1-high

GPT 5.1 (High)

61.0%

o3-high

o3-high

o3 (High)

58.5%

Tests AI models at identifying suitable products, researching their features, comparing prices, and meeting complex user constraints

View more

o3-pro

o3-pro

o3 Pro (High)

45.4%

gpt-5.1-high

gpt-5.1-high

GPT 5.1 (High)

44.7%

o3-high

o3-high

o3 (High)

44.7%