HumanEvalCoding
98.5/ 100
Verified
Last Verified: Unknown DateOpenAI Blog
Functional correctness of synthesized programs from docstrings.
Beta version: *Information might not be fully accurate. Please report any discrepancies.
Latest Data
2026-02-20
Context Window
512k
tokens
Input Cost
$1.50
per 1M tokens
Output Cost
$8.00
per 1M tokens
Cache Cost
$0.17 / Free
read / write per 1M
Parameters
Code Optimized
model footprint
Performance Analysis // Verified Benchmarks
Functional correctness of synthesized programs from docstrings.
Resolving real-world GitHub issues. Verified subset ensures solvable issues.
Contamination-free, continuously updated reasoning benchmark.
Agent performance in realistic terminal workflows (v2.0 leaderboard).
Verified desktop computer-use benchmark for end-to-end task completion.