Skip to content

Beta version: *Information might not be fully accurate. Please report any discrepancies.

Agents & Tools

Measures ability to use external tools, follow complex instructions, operate autonomously in multi-step workflows, and function as effective AI agents. Includes BFCL, API-based tasks, and instruction following benchmarks.