Glossary · Term

Mind2Web

← all terms

Definition

A benchmark of real web tasks used to evaluate browsing agents.

A web-agent evaluation suite covering hundreds of tasks across real-world websites, used as a standard reference for browsing-agent generalization.

Mentioned in 2 episodes

  1. 061
    When Helpful Agents Go Sideways: A 404 Error, Campus Security, and Why Alignment Misses This
  2. 008
    Why Long-Horizon AI Agents Get Stuck, and a Milestone-Based Fix That Helps