Beta version: *Information might not be fully accurate. Please report any discrepancies.

OpenAIVerifiedOpen Weights6 benchmarks

GPT-oss-120b

Released 2025-10-20120B Architecture

Training: 2024-10

Verified Model Card

Latest Data

2026-02-20

Context Window

128k

tokens

Input Cost

Free

per 1M tokens

Output Cost

Free

per 1M tokens

Parameters

120B

model footprint

Benchmark Provenance

Performance Analysis // Verified Benchmarks

MMLU (5-shot)Knowledge

90/ 100

Verified

Last Verified: Unknown DateOpenAI Blog

Massive Multitask Language Understanding covers 57 subjects across STEM, the humanities, social sciences, and more.

MATH (CoT)Math

97.9/ 100

Verified

Last Verified: Unknown DateOpenAI Blog

Challenging competition mathematics problems (AIME/IMO level).

LiveBenchReasoning

46.09/ 100

Verified

Last Verified: 2026-02-20LiveBench

Contamination-free, continuously updated reasoning benchmark.

HLEScience

19/ 100

Verified

Last Verified: Unknown DateOpenAI Blog

Humanity's Last Exam - Hard reasoning benchmark without tools.

GPQA DiamondSTEM

80.1/ 100

Verified

Last Verified: Unknown DateOpenAI Blog

Graduate-Level Google-Proof Q&A Benchmark.

IFBenchInstruction Following

0.695/ 100

Verified

Last Verified: Unknown DateOpenAI Blog

Artificial Analysis IFBench. Evaluates precise instruction following with constraints.