Beta version: *Information might not be fully accurate. Please report any discrepancies.

OpenAIVerified9 benchmarks

GPT-5.3 Codex

Released 2026-02-05Code Optimized Architecture

Verified Model Card

Latest Data

2026-02-20

Context Window

512k

tokens

Input Cost

$1.50

per 1M tokens

Output Cost

$8.00

per 1M tokens

Cache Cost

$0.17 / Free

read / write per 1M

Parameters

Code Optimized

model footprint

Benchmark Provenance

Performance Analysis // Verified Benchmarks

HumanEvalCoding

98.5/ 100

Verified

Last Verified: Unknown DateOpenAI Blog

Functional correctness of synthesized programs from docstrings.

SWE-bench VerifiedCoding

72.4/ 100

Verified

Last Verified: Unknown DateOpenAI Blog

Resolving real-world GitHub issues. Verified subset ensures solvable issues.

LiveBenchReasoning

74.3/ 100

Verified

Last Verified: 2026-02-20LiveBench

Contamination-free, continuously updated reasoning benchmark.

CyberGymCoding

77.6/ 100

Verified

Last Verified: 2026-02-05Introducing GPT-5.3 Codex

Cybersecurity-flavored coding benchmark in simulated environments.

Terminal-Bench 2.0Agentic

77.3/ 100

Verified

Last Verified: Unknown DateOpenAI Blog

Agent performance in realistic terminal workflows (v2.0 leaderboard).

OSWorld-VerifiedAgentic

64.7/ 100

Verified

Last Verified: Unknown DateOpenAI Blog

Verified desktop computer-use benchmark for end-to-end task completion.

SWE-LancerAgentic

81.4/ 100

Verified

Last Verified: 2026-02-05Introducing GPT-5.3 Codex

Software engineering task completion in multi-step coding workflows.

SWE-bench ProAgentic

56.8/ 100

Verified

Last Verified: 2026-02-05Introducing GPT-5.3 Codex

Higher-difficulty SWE-bench subset for frontier coding agents.

GDPVal-AAAgentic

70.9/ 100

Verified

Last Verified: 2026-02-05Introducing GPT-5.3 Codex

Artificial Analysis GDPVal benchmark for knowledge-work quality.