SWE-bench VerifiedCoding
80.6/ 100
Verified
Last Verified: 2026-04-24DeepSeek V4 Announcement
Resolving real-world GitHub issues. Verified subset ensures solvable issues.
Beta version: *Information might not be fully accurate. Please report any discrepancies.
Latest Data
2026-04-24
Context Window
1.0M
tokens
Input Cost
$0.43
per 1M tokens
Output Cost
$0.87
per 1M tokens
Parameters
1.6T MoE (49B activated)
model footprint
Performance Analysis // Verified Benchmarks
Resolving real-world GitHub issues. Verified subset ensures solvable issues.
Grade school math word problems requiring multi-step reasoning.
A more robust and harder version of MMLU, focusing on complex reasoning and STEM subjects.
Humanity's Last Exam full evaluation without tools.
Graduate-Level Google-Proof Q&A Benchmark.
Agent performance in realistic terminal workflows (v2.0 leaderboard).
Higher-difficulty SWE-bench subset for frontier coding agents.