Beta version: *Information might not be fully accurate. Please report any discrepancies.

OpenAIVerified26 benchmarks

GPT-5.5

Released 2026-04-23Unknown Architecture

Verified Model Card

Latest Data

2026-05-21

Context Window

1.1M

tokens

Input Cost

$5.00

per 1M tokens

Output Cost

$30.00

per 1M tokens

Parameters

Unknown

model footprint

Benchmark Provenance

Performance Analysis // Verified Benchmarks

LMArena ELOReal-world

1478/ 1700

Verified

Last Verified: 2026-05-21Chatbot Arena Leaderboard

Chatbot Arena ELO score. Crowd-sourced human preference ranking.

LMArena WebDev ELOCoding

1440/ 1700

Verified

Last Verified: 2026-05-21Chatbot Arena Leaderboard

WebDev Arena ELO score. Human preference ranking for web development tasks.

LMArena Vision ELOVision

1288/ 1700

Verified

Last Verified: 2026-05-21Chatbot Arena Leaderboard

Vision Arena ELO score. Human preference ranking for multimodal vision tasks.

LMArena Document ELOVision

1492/ 1700

Verified

Last Verified: 2026-05-21Chatbot Arena Leaderboard

Document Arena ELO score. Human preference ranking for document understanding.

HLE-FullScience

41.4/ 100

Verified

Last Verified: 2026-04-23Introducing GPT-5.5

Humanity's Last Exam full evaluation without tools.

HLE-Full (w/ tools)Science

52.2/ 100

Verified

Last Verified: 2026-04-23Introducing GPT-5.5

Humanity's Last Exam full evaluation with tool access enabled.

CyberGymCoding

81.8/ 100

Verified

Last Verified: 2026-04-23Introducing GPT-5.5

Cybersecurity-flavored coding benchmark in simulated environments.

GPQA DiamondSTEM

93.6/ 100

Verified

Last Verified: 2026-04-23Introducing GPT-5.5

Graduate-Level Google-Proof Q&A Benchmark.

ARC-AGI-1Reasoning

95/ 100

Verified

Last Verified: 2026-04-23Introducing GPT-5.5

Abstraction and Reasoning Corpus - Level 1.

ARC-AGI-2Reasoning

85/ 100

Verified

Last Verified: 2026-04-23Introducing GPT-5.5

Abstraction and Reasoning Corpus - Level 2 (Extreme difficulty).

Graphwalks BfsLong Context

73.7/ 100

Verified

Last Verified: 2026-04-23Introducing GPT-5.5

Traversal-based long context reasoning using BFS (128k).

MMMU-ProVision

81.2/ 100

Verified

Last Verified: 2026-04-23Introducing GPT-5.5

Professional level MMMU expansion.

Terminal-Bench 2.0Agentic

82.7/ 100

Verified

Last Verified: 2026-04-23Introducing GPT-5.5

Agent performance in realistic terminal workflows (v2.0 leaderboard).

OSWorld-VerifiedAgentic

78.7/ 100

Verified

Last Verified: 2026-04-23Introducing GPT-5.5

Verified desktop computer-use benchmark for end-to-end task completion.

SWE-bench ProAgentic

58.6/ 100

Verified

Last Verified: 2026-04-23Introducing GPT-5.5

Higher-difficulty SWE-bench subset for frontier coding agents.

BrowseCompAgentic

84.4/ 100

Verified

Last Verified: 2026-04-23Introducing GPT-5.5

Web browsing + synthesis benchmark for research agents.

MCP AtlasAgentic

75.3/ 100

Verified

Last Verified: 2026-04-23Introducing GPT-5.5

Multi-step workflows using Model Context Protocol.

ToolathlonAgentic

55.6/ 100

Verified

Last Verified: 2026-04-23Introducing GPT-5.5

Long horizon real-world software tasks.

TAU-Bench TelecomAgentic

98/ 100

Verified

Last Verified: 2026-04-23Introducing GPT-5.5

Telecom-domain tool-use and workflow benchmark.

FrontierMathMath

51.7/ 100

Verified

Last Verified: 2026-04-23Introducing GPT-5.5

Advanced mathematics benchmark with tiered difficulty.

Expert-SWECoding

73.1/ 100

Verified

Last Verified: 2026-04-23Introducing GPT-5.5

Long-horizon software engineering tasks requiring expert-level reasoning.

GeneBenchSTEM

25/ 100

Verified

Last Verified: 2026-04-23Introducing GPT-5.5

Genetics and quantitative biology benchmark.

BixBenchSTEM

80.5/ 100

Verified

Last Verified: 2026-04-23Introducing GPT-5.5

Bioinformatics and data analysis benchmark.

Finance AgentAgentic

60/ 100

Verified

Last Verified: 2026-04-23Introducing GPT-5.5

Financial analysis and reasoning benchmark for agentic workflows.

OfficeQA ProAgentic

54.1/ 100

Verified

Last Verified: 2026-04-23Introducing GPT-5.5

Advanced document reasoning and office task completion benchmark.

FrontierMath Tier 4Math

35.4/ 100

Verified

Last Verified: 2026-04-23Introducing GPT-5.5

Hardest tier of FrontierMath advanced mathematics benchmark.