Humanity's Last Exam full evaluation without tools.
Beta version: *Information might not be fully accurate. Please report any discrepancies.
Beta version: *Information might not be fully accurate. Please report any discrepancies.
Latest Data
2026-03-05
Context Window
1.1M
tokens
Input Cost
$2.50
per 1M tokens
Output Cost
$15.00
per 1M tokens
Parameters
Unknown
model footprint
Performance Analysis // Verified Benchmarks
Humanity's Last Exam full evaluation without tools.
Humanity's Last Exam full evaluation with tool access enabled.
Graduate-Level Google-Proof Q&A Benchmark.
Abstraction and Reasoning Corpus - Level 1.
Abstraction and Reasoning Corpus - Level 2 (Extreme difficulty).
Professional level MMMU expansion.
OCR benchmark measuring edit distance (lower is better).
Agent performance in realistic terminal workflows (v2.0 leaderboard).
Verified desktop computer-use benchmark for end-to-end task completion.
Higher-difficulty SWE-bench subset for frontier coding agents.
Web browsing + synthesis benchmark for research agents.
Long horizon real-world software tasks.
Telecom-domain tool-use and workflow benchmark.
Advanced mathematics benchmark with tiered difficulty.