Beta version: *Information might not be fully accurate. Please report any discrepancies.
Math
Assesses mathematical capabilities from basic arithmetic to competition-level problems. Covers GSM8K, MATH, AIME, and specialized mathematical reasoning benchmarks.
Top Models
Domain Info
- Benchmarks
- 11
- Models Evaluated
- 68
- Categories
- Math