Concept · 1 episode(s)

GDP-Weighted Evaluation

Definition

GDP-weighted evaluation measures AI capability not by raw benchmark scores but by economic value: how much of the world’s actual paid work the model could plausibly do. It’s an attempt to ground capability claims in something less gameable than benchmark accuracy.

Episodes covering this

017
When the Agent Grades Its Own Homework: A Brutal New Benchmark for AI Workers
Gym-Anything: Turn any Software into an Agent Environment
Aggarwal, Neubig, Welleck · CMU·31 min·May 03, 2026