Definition
GAIA is a benchmark for general AI assistants: real-world tasks that require web browsing, file handling, and multi-step reasoning, scored on whether the final answer is correct. Humans score very high; even strong agent stacks have historically scored well below them, making it a useful frontier metric.