Humanity's Last Exam - Hard reasoning benchmark without tools.
Beta version: *Information might not be fully accurate. Please report any discrepancies.
Beta version: *Information might not be fully accurate. Please report any discrepancies.
Latest Data
2026-04-07
Context Window
200k
tokens
Input Cost
$1.40
per 1M tokens
Output Cost
$4.40
per 1M tokens
Cache Cost
$0.26 / Free
read / write per 1M
Parameters
744B total (40B active)
model footprint
Performance Analysis // Verified Benchmarks
Humanity's Last Exam - Hard reasoning benchmark without tools.
Humanity's Last Exam full evaluation with tool access enabled.
Future prediction of AIME performance levels.
Harvard-MIT Mathematics Tournament November 2025 - High difficulty competition math.
Harvard-MIT Mathematics Tournament 2026 - High difficulty competition math.
International Mathematical Olympiad style answer-only benchmark.
Cybersecurity-flavored coding benchmark in simulated environments.
Graduate-Level Google-Proof Q&A Benchmark.
Agent performance in realistic terminal workflows (v2.0 leaderboard).
Higher-difficulty SWE-bench subset for frontier coding agents.
Web browsing + synthesis benchmark for research agents.
Multi-step workflows using Model Context Protocol.
Long horizon real-world software tasks.
Tool-use and API orchestration benchmark for assistants.