Definition
GDP-weighted evaluation measures AI capability not by raw benchmark scores but by economic value: how much of the world’s actual paid work the model could plausibly do. It’s an attempt to ground capability claims in something less gameable than benchmark accuracy.