SWE-bench VerifiedCoding
74.4*/ 100
Verified
Last Verified: 2026-01-29Artificial Analysis (Independent)
Resolving real-world GitHub issues. Verified subset ensures solvable issues.
Beta version: *Information might not be fully accurate. Please report any discrepancies.
Latest Data
2026-01-29
Context Window
256k
tokens
Input Cost
$0.09
per 1M tokens
Output Cost
$0.30
per 1M tokens
Cache Cost
$0.02 / Free
read / write per 1M
Parameters
196B total (11B active, MoE)
model footprint
Performance Analysis // Verified Benchmarks
Resolving real-world GitHub issues. Verified subset ensures solvable issues.
A more robust and harder version of MMLU, focusing on complex reasoning and STEM subjects.
Graduate-Level Google-Proof Q&A Benchmark.
American Invitational Mathematics Examination 2025 problems.
Agent performance in realistic terminal workflows (v2.0 leaderboard).