Glossary · Term

xbench

← all terms

Definition

A multi-task evaluation suite for general-purpose AI agents.

A benchmark used in evaluating open-source agentic search systems, covering varied task families and used alongside BrowseComp and HLE in open-source search-agent comparisons.

Mentioned in 1 episode

  1. 021
    Ten Thousand Examples Beat the Full Industrial Pipeline for Search Agents