Beta version: *Information might not be fully accurate. Please report any discrepancies.
BrowseComp variant with explicit context-window management.
Score Distribution
AgentBench
agentbench
IFEval
ifeval
Inverse IFEval
ifeval-inverse
IFBench
ifbench
Verified AdvancedIF
verified-advancedif
MultiChallenge
multichallenge