← all terms
A benchmark where AI agents have to complete tasks on simulated websites.
A benchmark for web-based autonomous agents, providing realistic simulated websites and multi-step tasks for end-to-end evaluation.
Also called: WebArena-Lite