SWE-bench VerifiedCoding
77.6/ 100
Verified
Last Verified: 2026-04-01Mistral Medium 3.5 Announcement
Resolving real-world GitHub issues. Verified subset ensures solvable issues.
Beta version: *Information might not be fully accurate. Please report any discrepancies.
Latest Data
2026-04-01
Context Window
256k
tokens
Input Cost
$1.50
per 1M tokens
Output Cost
$7.50
per 1M tokens
Parameters
128B
model footprint
Performance Analysis // Verified Benchmarks
Resolving real-world GitHub issues. Verified subset ensures solvable issues.
500-problem math benchmark for broad quantitative reasoning.
Contamination-free coding benchmark using recent problems.
Graduate-Level Google-Proof Q&A Benchmark.
American Invitational Mathematics Examination 2025 problems.
Multi-language coding agent benchmark with editor-in-the-loop tasks.
Retail-domain tool-use and workflow benchmark from τ²-bench.
Telecom-domain tool-use and workflow benchmark.