Definition
A common but subtly wrong way to average data, where small groups get over-weighted because each group's average counts equally.
A pooling error where per-batch loss is computed as the average of per-rank averages rather than as a sum-over-tokens divided by total tokens, biasing the gradient when batch sizes are uneven; a known bug class in some SFT pipelines.
Also called: mean-of-means