Beta version: *Information might not be fully accurate. Please report any discrepancies.

DatabricksVerifiedOpen Weights11 benchmarks

DBRX Instruct

Released 2024-03-27132B (MoE) Architecture

Training: 2023-12

Verified Official Model Card

Latest Data

2026-02-16

Context Window

32k

tokens

Input Cost

$0.60

per 1M tokens

Output Cost

$2.40

per 1M tokens

Parameters

132B (MoE)

model footprint

Benchmark Provenance

Performance Analysis // Verified Benchmarks

MMLU (5-shot)Knowledge

74.5/ 100

Verified

Last Verified: Unknown DateDatabricks AI

Massive Multitask Language Understanding covers 57 subjects across STEM, the humanities, social sciences, and more.

HumanEvalCoding

70.1/ 100

Verified

Last Verified: Unknown DateDatabricks AI

Functional correctness of synthesized programs from docstrings.

GSM8KMath

72.8/ 100

Verified

Last Verified: Unknown DateDatabricks AI

Grade school math word problems requiring multi-step reasoning.

AIME 2024/25Math

3*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

American Invitational Mathematics Examination. Competition-level math.

AA Intelligence IndexReal-world

8.3*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

Artificial Analysis aggregate intelligence index.

MMLU-ProScience

39.7*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

A more robust and harder version of MMLU, focusing on complex reasoning and STEM subjects.

HLEScience

6.6*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

Humanity's Last Exam - Hard reasoning benchmark without tools.

MATH-500Math

27.9*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

500-problem math benchmark for broad quantitative reasoning.

LiveCodeBench v6Coding

9.3*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

Contamination-free coding benchmark using recent problems.

GPQA DiamondSTEM

33.1*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

Graduate-Level Google-Proof Q&A Benchmark.

SciCodeAdvanced Tasks

11.8*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

Scientific programming benchmark for code synthesis and correctness.