Beta version: *Information might not be fully accurate. Please report any discrepancies.
Model Context Protocol interoperability benchmark.
Score Distribution
AgentBench
agentbench
IFEval
ifeval
Inverse IFEval
ifeval-inverse
IFBench
ifbench
Verified AdvancedIF
verified-advancedif
MultiChallenge
multichallenge