Beta version: *Information might not be fully accurate. Please report any discrepancies.

Long Context

Tests ability to process, understand, and reason over very long inputs. Includes needle-in-haystack tests, long-document QA, and benchmarks measuring performance degradation with context length.

Top Models

1GPT-5.4

5Claude Opus 4.5 High

Domain Info

Benchmarks: 4
Models Evaluated: 43
Categories: Long Context

Benchmarks

MRCR v2 LongBench v2 AA-LCR Graphwalks Bfs