Beta version: *Information might not be fully accurate. Please report any discrepancies.
Multi-agent swarm variant of BrowseComp.
Score Distribution
AgentBench
agentbench
IFEval
ifeval
Inverse IFEval
ifeval-inverse
IFBench
ifbench
Verified AdvancedIF
verified-advancedif
MultiChallenge
multichallenge