Concept · 5 episode(s)

BrowseComp

Definition

BrowseComp is a benchmark that measures how well agents can answer questions requiring real web browsing — queries whose answers aren’t in any pre-indexed dataset and demand navigating live pages. It’s designed to be hard for static-knowledge models and to reward genuinely useful tool use.

Episodes covering this

131
Why Autonomous Research Agents Forget Their Own Lessons, and Arbor's Fix
Toward Generalist Autonomous Research via Hypothesis-Tree Refinement
Jin, Hu, Qiu et al. · Renmin University of China·33 min·Jun 11, 2026
092
When Search Agents Don't Really Search: The Memory Shortcut Hiding in Browsing Benchmarks
LiveBrowseComp: Are Search Agents Searching, or Just Verifying What They Already Know?
Fan, Wang, Chu et al. · Harbin Institute of Technology·27 min·May 28, 2026
083
Training the Translator: How a Small Communication Model Lets Agent Teams Outperform Themselves
AgentFugue: Agent Scaling for Long-Horizon Tasks through Collective Reasoning
Hu, Qian, Wang et al. · GSAI·24 min·May 26, 2026
051
Why Parallel Sampling Plateaus, And What Evidence Graphs Do Instead
Argus: Evidence Assembly for Scalable Deep Research Agents
Zhang, Su, Chen et al. · MiroMind AI·22 min·May 18, 2026
021
Ten Thousand Examples Beat the Full Industrial Pipeline for Search Agents
OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories
Du, Ye, Tang et al. · Shanghai Jiao Tong University·14 min·May 06, 2026

Worth reading next

Papers we haven't done a deep dive on yet, but would recommend on this topic.

OpenAI o3 System Card