Beta version: *Information might not be fully accurate. Please report any discrepancies.

StepFunVerifiedOpen Weights5 benchmarks

Step 3.5 Flash

Released 2026-01-29196B total (11B active, MoE) Architecture

Training: 2025-01

Verified Model Card

Latest Data

2026-01-29

Context Window

256k

tokens

Input Cost

$0.09

per 1M tokens

Output Cost

$0.30

per 1M tokens

Cache Cost

$0.02 / Free

read / write per 1M

Parameters

196B total (11B active, MoE)

model footprint

Benchmark Provenance

Performance Analysis // Verified Benchmarks

SWE-bench VerifiedCoding

74.4*/ 100

Verified

Last Verified: 2026-01-29Artificial Analysis (Independent)

Resolving real-world GitHub issues. Verified subset ensures solvable issues.

MMLU-ProScience

84.4*/ 100

Verified

Last Verified: 2026-01-29Artificial Analysis (Independent)

A more robust and harder version of MMLU, focusing on complex reasoning and STEM subjects.

GPQA DiamondSTEM

83.5*/ 100

Verified

Last Verified: 2026-01-29Artificial Analysis (Independent)

Graduate-Level Google-Proof Q&A Benchmark.

AIME 2025Math

97.3*/ 100

Verified

Last Verified: 2026-01-29Artificial Analysis (Independent)

American Invitational Mathematics Examination 2025 problems.

Terminal-Bench 2.0Agentic

51*/ 100

Verified

Last Verified: 2026-01-29Artificial Analysis (Independent)

Agent performance in realistic terminal workflows (v2.0 leaderboard).