APEX-SWE: Observability

Evaluates a model's ability to diagnose and remediate real-world software engineering production failures.
0%
10%
20%
30%
40%
50%
60%
70%
80%

APEX NEWSLETTER

The latest on frontier AI performance, straight to your inbox.

New benchmarks, leaderboard shifts, and research from the APEX team.

By subscribing you agree to receive updates from Mercor.
Unsubscribe anytime.