Beta version: *Information might not be fully accurate. Please report any discrepancies.

Google DeepMindVerified33 benchmarks

Gemini 3 Flash

Released 2025-12-17Speed Optimized Architecture

Training: 2025-03

Verified Model Card

Latest Data

2026-02-20

Context Window

1.0M

tokens

Input Cost

$0.50

per 1M tokens

Output Cost

$3.00

per 1M tokens

Cache Cost

$0.05 / Free

read / write per 1M

Parameters

Speed Optimized

model footprint

Model Variants

Compare All

1 Variants Available

Gemini 3 Flash High

Unknown2025-12-17

Benchmark Provenance

Performance Analysis // Verified Benchmarks

SWE-bench VerifiedCoding

78/ 100

Verified

Last Verified: 2025-11-18Gemini 3 Pro Announcement

Resolving real-world GitHub issues. Verified subset ensures solvable issues.

LiveBenchReasoning

72.4/ 100

Verified

Last Verified: 2026-02-20LiveBench

Contamination-free, continuously updated reasoning benchmark.

AA Intelligence IndexReal-world

35.1*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

Artificial Analysis aggregate intelligence index.

MMLU-ProScience

88.2*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

A more robust and harder version of MMLU, focusing on complex reasoning and STEM subjects.

HLEScience

14.1*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

Humanity's Last Exam - Hard reasoning benchmark without tools.

HLE-Full (w/ tools)Science

43.5/ 100

Verified

Last Verified: 2025-11-18Gemini 3 Pro Announcement

Humanity's Last Exam full evaluation with tool access enabled.

SimpleQA VerifiedKnowledge

68.7/ 100

Verified

Last Verified: 2025-11-18Gemini 3 Pro Announcement

Verified subset of SimpleQA for parametric knowledge evaluation.

AA Math IndexMath

55.7*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

Artificial Analysis aggregate math capability index.

LiveCodeBench v6Coding

79.7*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

Contamination-free coding benchmark using recent problems.

LiveCodeBench ProCoding

2316/ 4000

Verified

Last Verified: 2025-11-18Gemini 3 Pro Announcement

Competitive programming problems from Codeforces, ICPC, and IOI with Elo rating.

AA Coding IndexCoding

37.8*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

Artificial Analysis aggregate coding capability index.

GPQA DiamondSTEM

81.2*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

Graduate-Level Google-Proof Q&A Benchmark.

MRCR v2Long Context

67.2/ 100

Verified

Last Verified: 2025-11-18Gemini 3 Pro Announcement

Multi-Round Context Retrieval - 8-needle test.

AA-LCRLong Context

48*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

Artificial Analysis Long Context Reasoning benchmark. Evaluates reasoning over long contexts.

MMMLUMultilingual

91.8/ 100

Verified

Last Verified: 2025-11-18Gemini 3 Pro Announcement

Massive Multilingual Language Understanding.

IFBenchInstruction Following

55.1*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

Artificial Analysis IFBench. Evaluates precise instruction following with constraints.

AIME 2025Math

55.7*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

American Invitational Mathematics Examination 2025 problems.

ARC-AGI-2Reasoning

33.6/ 100

Verified

Last Verified: 2025-11-18Gemini 3 Pro Announcement

Abstraction and Reasoning Corpus - Level 2 (Extreme difficulty).

Global PIQAMultilingual

92.8/ 100

Verified

Last Verified: 2025-11-18Gemini 3 Pro Announcement

Physical Interaction QA across multiple languages and cultures.

MMMU-ProVision

81.2/ 100

Verified

Last Verified: 2025-11-18Gemini 3 Pro Announcement

Professional level MMMU expansion.

OmniDocBench 1.5Vision

0.121/ 1

Verified

Last Verified: 2025-11-18Gemini 3 Pro Announcement

OCR benchmark measuring edit distance (lower is better).

ScreenSpot-ProVision

69.1/ 100

Verified

Last Verified: 2025-11-18Gemini 3 Pro Announcement

Screen understanding benchmark for GUI interaction.

CharXiv ReasoningVision

80.3/ 100

Verified

Last Verified: 2025-11-18Gemini 3 Pro Announcement

Information synthesis from complex charts.

Terminal-Bench 2.0Agentic

47.6/ 100

Verified

Last Verified: 2025-11-18Gemini 3 Pro Announcement

Agent performance in realistic terminal workflows (v2.0 leaderboard).

Terminal-Bench HardAgentic

31.8*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

Hard split of Terminal-Bench focused on tougher terminal workflows.

Vending-Bench 2Agentic

3635/ 10000

Verified

Last Verified: 2025-11-18Gemini 3 Pro Announcement

Long-horizon business simulation benchmark (final account balance).

FACTS Benchmark SuiteAgentic

61.9/ 100

Verified

Last Verified: 2025-11-18Gemini 3 Pro Announcement

Factuality benchmark across grounding, parametric, search, and multimodal.

MCP AtlasAgentic

57.4/ 100

Verified

Last Verified: 2025-11-18Gemini 3 Pro Announcement

Multi-step workflows using Model Context Protocol.

ToolathlonAgentic

49.4/ 100

Verified

Last Verified: 2025-11-18Gemini 3 Pro Announcement

Long horizon real-world software tasks.

TAU-BenchAgentic

90.2/ 100

Verified

Last Verified: 2025-11-18Gemini 3 Pro Announcement

Tool-use and API orchestration benchmark for assistants.

TAU-Bench TelecomAgentic

43.3*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

Telecom-domain tool-use and workflow benchmark.

SciCodeAdvanced Tasks

49.9*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

Scientific programming benchmark for code synthesis and correctness.

VideoMMMUVideo

86.9/ 100

Verified

Last Verified: 2025-11-18Gemini 3 Pro Announcement

Video variant of MMMU for multimodal understanding and reasoning.