Glossary · Term

AndroidWorld

← all terms

Definition

A benchmark that tests AI agents on real Android phone apps.

A mobile-device agent benchmark covering ~20 apps and ~116 tasks with state-based verification of cross-app workflows.

Mentioned in 1 episode

  1. 017
    When the Agent Grades Its Own Homework: A Brutal New Benchmark for AI Workers