Beta version: *Information might not be fully accurate. Please report any discrepancies.

OpenAIVerified15 benchmarks

GPT-5.2 Pro

Released 2025-12-11Unknown (MoE) Architecture

Verified Model Card

Latest Data

2026-02-18

Context Window

400k

tokens

Input Cost

$21.00

per 1M tokens

Output Cost

$168.00

per 1M tokens

Parameters

Unknown (MoE)

model footprint

Benchmark Provenance

Performance Analysis // Verified Benchmarks

MMLU (5-shot)Knowledge

93.2/ 100

Verified

Last Verified: 2026-02-18LLM Stats

Massive Multitask Language Understanding covers 57 subjects across STEM, the humanities, social sciences, and more.

MATH (CoT)Math

100/ 100

Verified

Last Verified: 2025-12-11Introducing GPT-5.2

Challenging competition mathematics problems (AIME/IMO level).

HumanEvalCoding

95/ 100

Verified

Last Verified: 2025-12-11Introducing GPT-5.2

Functional correctness of synthesized programs from docstrings.

SWE-bench VerifiedCoding

80/ 100

Verified

Last Verified: 2025-12-11Introducing GPT-5.2

Resolving real-world GitHub issues. Verified subset ensures solvable issues.

LMArena ELOReal-world

1512/ 1700

Verified

Last Verified: 2026-02-18Chatbot Arena Leaderboard

Chatbot Arena ELO score. Crowd-sourced human preference ranking.

MMLU-ProScience

88.5/ 100

Verified

Last Verified: 2025-12-11Introducing GPT-5.2

A more robust and harder version of MMLU, focusing on complex reasoning and STEM subjects.

HLEScience

50/ 100

Verified

Last Verified: 2025-12-11Introducing GPT-5.2

Humanity's Last Exam - Hard reasoning benchmark without tools.

GPQA DiamondSTEM

93.2/ 100

Verified

Last Verified: 2026-02-18LLM Stats

Graduate-Level Google-Proof Q&A Benchmark.

ARC-AGI-1Reasoning

86.2/ 100

Verified

Last Verified: 2025-12-11Introducing GPT-5.2

Abstraction and Reasoning Corpus - Level 1.

AIME 2025Math

100/ 100

Verified

Last Verified: 2025-12-11Introducing GPT-5.2

American Invitational Mathematics Examination 2025 problems.

ARC-AGI-2Reasoning

52.9/ 100

Verified

Last Verified: 2025-12-11Introducing GPT-5.2

Abstraction and Reasoning Corpus - Level 2 (Extreme difficulty).

MMMU-ProVision

86.5/ 100

Verified

Last Verified: 2025-12-11Introducing GPT-5.2

Professional level MMMU expansion.

CharXiv-RQVision

88.7/ 100

Verified

Last Verified: 2025-12-11Introducing GPT-5.2

Chart-based reasoning from arXiv papers (Reasoning QA).

SWE-bench ProAgentic

55.6/ 100

Verified

Last Verified: 2025-12-11Introducing GPT-5.2

Higher-difficulty SWE-bench subset for frontier coding agents.

VideoMMMUVideo

90.5/ 100

Verified

Last Verified: 2025-12-11Introducing GPT-5.2

Video variant of MMMU for multimodal understanding and reasoning.