The AI Consumer Index (ACE) assesses whether frontier AI models can perform everyday consumer tasks in shopping, food, gaming, and DIY.
We created ACE because consumer applications are among the most widespread and fastest-growing uses of AI—yet existing benchmarks have paid too little attention. ACE evaluates whether AI models can help people with everyday activities such as finding products, enjoying their hobbies, fixing items at home, and cooking. It comprises a hidden held-out set of 400 prompts.
ACE tests frontier models with web search enabled, mimicking real-world consumer behavior. It uses a novel evaluation methodology that combines expert-crafted rubrics with hurdle criteria and grounding checks. Read the paper to learn more about our approach.
To support open research, we have open-sourced 80 in-distribution cases on Hugging Face, including all metadata labels. We have also released our evaluation harness for full reproducibility.
View more
GPT 5.1 (High)
56%
GPT 5 (High)
55%
o3 Pro (On)
54%
View more
GPT 5 (High)
70%
65%
o3 Pro (On)
60%
View more
o3 Pro (On)
61%
GPT 5.1 (High)
61%
o3 (On)
59%
View more
o3 Pro (On)
45%
GPT 5.1 (High)
45%
o3 (On)
45%