Resolving real-world GitHub issues. Verified subset ensures solvable issues.
Beta version: *Information might not be fully accurate. Please report any discrepancies.
Beta version: *Information might not be fully accurate. Please report any discrepancies.
Latest Data
2026-04-16
Context Window
1.0M
tokens
Input Cost
$5.00
per 1M tokens
Output Cost
$25.00
per 1M tokens
Parameters
Unknown
model footprint
Performance Analysis // Verified Benchmarks
Resolving real-world GitHub issues. Verified subset ensures solvable issues.
Humanity's Last Exam full evaluation without tools.
Humanity's Last Exam full evaluation with tool access enabled.
Cybersecurity-flavored coding benchmark in simulated environments.
Graduate-Level Google-Proof Q&A Benchmark.
Massive Multilingual Language Understanding.
Screen understanding benchmark for GUI interaction.
Information synthesis from complex charts.
Agent performance in realistic terminal workflows (v2.0 leaderboard).
Verified desktop computer-use benchmark for end-to-end task completion.
Higher-difficulty SWE-bench subset for frontier coding agents.
Software engineering performance across multilingual codebases.
Web browsing + synthesis benchmark for research agents.
Multi-step workflows using Model Context Protocol.