Beta version: *Information might not be fully accurate. Please report any discrepancies.

Google DeepMindVerified8 benchmarks

Gemini 3 Deep Think

Released 2026-02-12Unknown Architecture

Verified Model Card

Latest Data

Unknown

Context Window

1.0M

tokens

Input Cost

$2.00

per 1M tokens

Output Cost

$12.00

per 1M tokens

Parameters

Unknown

model footprint

Benchmark Provenance

Performance Analysis // Verified Benchmarks

MMLU (5-shot)Knowledge

91.8/ 100

Verified

Last Verified: Unknown DateGoogle AI Blog

Massive Multitask Language Understanding covers 57 subjects across STEM, the humanities, social sciences, and more.

MATH (CoT)Math

96/ 100

Verified

Last Verified: Unknown DateGoogle AI Blog

Challenging competition mathematics problems (AIME/IMO level).

SWE-bench VerifiedCoding

76.2/ 100

Verified

Last Verified: Unknown DateGoogle AI Blog

Resolving real-world GitHub issues. Verified subset ensures solvable issues.

MMMU (Multimodal)Multimodal

81/ 100

Verified

Last Verified: Unknown DateGoogle AI Blog

Multi-discipline Multimodal Understanding and Reasoning.

LMArena ELOReal-world

1506/ 1700

Verified

Last Verified: Unknown DateGoogle AI Blog

Chatbot Arena ELO score. Crowd-sourced human preference ranking.

HLEScience

41/ 100

Verified

Last Verified: Unknown DateGoogle AI Blog

Humanity's Last Exam - Hard reasoning benchmark without tools.

GPQA DiamondSTEM

93.8/ 100

Verified

Last Verified: Unknown DateGoogle AI Blog

Graduate-Level Google-Proof Q&A Benchmark.

ARC-AGI-2Reasoning

45.1/ 100

Verified

Last Verified: Unknown DateGoogle AI Blog

Abstraction and Reasoning Corpus - Level 2 (Extreme difficulty).