Concept · 2 episode(s)

BrowseComp

← all concepts

Definition

BrowseComp is a benchmark that measures how well agents can answer questions requiring real web browsing — queries whose answers aren’t in any pre-indexed dataset and demand navigating live pages. It’s designed to be hard for static-knowledge models and to reward genuinely useful tool use.

Episodes covering this