mean of means · Glossary · AI Papers: A Deep Dive

Definition

Plain language

A common but subtly wrong way to average data, where small groups get over-weighted because each group's average counts equally.

As stated in the literature

A pooling error where per-batch loss is computed as the average of per-rank averages rather than as a sum-over-tokens divided by total tokens, biasing the gradient when batch sizes are uneven; a known bug class in some SFT pipelines.

Also called: mean-of-means

Why it matters: It's a real bug class in training pipelines that can quietly bias gradients and produce results that don't reproduce when batch sizes change.

For example, averaging '90% accuracy on a 1000-example batch' with '50% accuracy on a 10-example batch' as if both batches were equal hides that the small batch should barely count.

Heard on the show

“And mean-of-means doesn't equal the true mean.”

Episode 009 — How Two Silent Library Bugs Quietly Invalidated a Wave of Reasoning Papers

Mentioned in 1 episode

009
How Two Silent Library Bugs Quietly Invalidated a Wave of Reasoning Papers

Related terms

gradient loss SFT token