This sample task is provided for illustration only. The scores may not reflect a model's overall performance within this domain. Overall domain scores represent the average across 200 held out prompts.
Provided with system logs and metrics data.
Analyze a production incident from observability data.
Task objectives:
Review provided logs, metrics, and traces to identify root cause of a 30-minute service outage.
Determine which microservice failed and why.
Recommend monitoring improvements to catch similar issues earlier.