Glossary · Term

BrowseComp

Definition

Plain language

A benchmark that tests whether an AI agent can answer real web research questions by browsing.

As stated in the literature

An evaluation suite measuring open-ended web-research task completion by browsing agents over real web content.

Also called: BrowseComp-ZH

Why it matters: It measures whether web-browsing agents can actually do useful research on the real internet, not just on curated snapshots.

For example, a BrowseComp task might ask 'what was the closing price of company X the day its CEO resigned?' and require the agent to actually find and read news on the live web.

Heard on the show

“Harness engineering: improve a terminal agent, and improve a web-search agent on a benchmark called BrowseComp.”

Episode 131 — Why Autonomous Research Agents Forget Their Own Lessons, and Arbor's Fix

Mentioned in 5 episodes

131
Why Autonomous Research Agents Forget Their Own Lessons, and Arbor's Fix
092
When Search Agents Don't Really Search: The Memory Shortcut Hiding in Browsing Benchmarks
083
Training the Translator: How a Small Communication Model Lets Agent Teams Outperform Themselves
051
Why Parallel Sampling Plateaus, And What Evidence Graphs Do Instead
021
Ten Thousand Examples Beat the Full Industrial Pipeline for Search Agents

Related concepts

BrowseComp

Related terms

agent