Humanity's Last Exam full evaluation without tools.
Beta version: *Information might not be fully accurate. Please report any discrepancies.
Beta version: *Information might not be fully accurate. Please report any discrepancies.
Latest Data
2026-04-23
Context Window
1.1M
tokens
Input Cost
$5.00
per 1M tokens
Output Cost
$30.00
per 1M tokens
Parameters
Unknown
model footprint
Performance Analysis // Verified Benchmarks
Humanity's Last Exam full evaluation without tools.
Humanity's Last Exam full evaluation with tool access enabled.
Cybersecurity-flavored coding benchmark in simulated environments.
Graduate-Level Google-Proof Q&A Benchmark.
Abstraction and Reasoning Corpus - Level 1.
Abstraction and Reasoning Corpus - Level 2 (Extreme difficulty).
Professional level MMMU expansion.
Agent performance in realistic terminal workflows (v2.0 leaderboard).
Verified desktop computer-use benchmark for end-to-end task completion.
Higher-difficulty SWE-bench subset for frontier coding agents.
Web browsing + synthesis benchmark for research agents.
Long horizon real-world software tasks.
Telecom-domain tool-use and workflow benchmark.
Advanced mathematics benchmark with tiered difficulty.
Long-horizon software engineering tasks requiring expert-level reasoning.
Genetics and quantitative biology benchmark.
Bioinformatics and data analysis benchmark.