Beta version: *Information might not be fully accurate. Please report any discrepancies.
Agents & Tools
Measures ability to use external tools, follow complex instructions, operate autonomously in multi-step workflows, and function as effective AI agents. Includes BFCL, API-based tasks, and instruction following benchmarks.
Top Models
Domain Info
- Benchmarks
- 48
- Models Evaluated
- 41
- Categories
- Agent, Agentic, Instruction Following