Glossary · Term

BrowseComp

← all terms

Definition

A benchmark that tests whether an AI agent can answer real web research questions by browsing.

An evaluation suite measuring open-ended web-research task completion by browsing agents over real web content.

Also called: BrowseComp-ZH

Mentioned in 2 episodes

  1. 051
    Why Parallel Sampling Plateaus, And What Evidence Graphs Do Instead
  2. 021
    Ten Thousand Examples Beat the Full Industrial Pipeline for Search Agents

Related concepts