Beta version: *Information might not be fully accurate. Please report any discrepancies.

OpenAIVerified7 benchmarks

GPT-5-mini

Released 2025-08-07Efficient Architecture

Training: 2024-05-30

Verified Model Card

Latest Data

2026-02-18

Context Window

128k

tokens

Input Cost

$0.25

per 1M tokens

Output Cost

$2.00

per 1M tokens

Cache Cost

$0.03 / Free

read / write per 1M

Parameters

Efficient

model footprint

Model Variants

Compare All

1 Variants Available

GPT-5-mini High

Efficient High-Cap2025-08-07

Benchmark Provenance

Performance Analysis // Verified Benchmarks

MMLU (5-shot)Knowledge

85.2/ 100

Verified

Last Verified: 2026-02-18LLM Stats

Massive Multitask Language Understanding covers 57 subjects across STEM, the humanities, social sciences, and more.

MATH (CoT)Math

78.5*/ 100

Verified

Last Verified: 2026-02-16Artificial Analysis (Independent)

Challenging competition mathematics problems (AIME/IMO level).

HumanEvalCoding

88*/ 100

Verified

Last Verified: 2026-02-16Artificial Analysis (Independent)

Functional correctness of synthesized programs from docstrings.

SWE-bench VerifiedCoding

55.2*/ 100

Verified

Last Verified: 2026-02-16Artificial Analysis (Independent)

Resolving real-world GitHub issues. Verified subset ensures solvable issues.

LMArena ELOReal-world

1358/ 1700

Verified

Last Verified: 2026-02-18Chatbot Arena Leaderboard

Chatbot Arena ELO score. Crowd-sourced human preference ranking.

MMLU-ProScience

84.1/ 100

Verified

Last Verified: 2026-02-18LLM Stats

A more robust and harder version of MMLU, focusing on complex reasoning and STEM subjects.

GPQA DiamondSTEM

75.8/ 100

Verified

Last Verified: 2026-02-18LLM Stats

Graduate-Level Google-Proof Q&A Benchmark.