Functional correctness of synthesized programs from docstrings.
Beta version: *Information might not be fully accurate. Please report any discrepancies.
Beta version: *Information might not be fully accurate. Please report any discrepancies.
Latest Data
2026-02-20
Context Window
512k
tokens
Input Cost
$1.50
per 1M tokens
Output Cost
$8.00
per 1M tokens
Cache Cost
$0.17 / Free
read / write per 1M
Parameters
Code Optimized
model footprint
Performance Analysis // Verified Benchmarks
Functional correctness of synthesized programs from docstrings.
Resolving real-world GitHub issues. Verified subset ensures solvable issues.
Contamination-free, continuously updated reasoning benchmark.
Cybersecurity-flavored coding benchmark in simulated environments.
Agent performance in realistic terminal workflows (v2.0 leaderboard).
Verified desktop computer-use benchmark for end-to-end task completion.
Software engineering task completion in multi-step coding workflows.
Higher-difficulty SWE-bench subset for frontier coding agents.
Artificial Analysis GDPVal benchmark for knowledge-work quality.