Definition
BrowseComp is a benchmark that measures how well agents can answer questions requiring real web browsing — queries whose answers aren’t in any pre-indexed dataset and demand navigating live pages. It’s designed to be hard for static-knowledge models and to reward genuinely useful tool use.