Concept · 1 episode(s)

Benchmark Contamination

← all concepts

Definition

Benchmark contamination occurs when evaluation questions, or close variants of them, were already present in a model's training data, inflating measured performance and confounding claims about genuine capability. It is a persistent threat to benchmark validity; in reasoning studies that compare original source problems to newly generated variants, contamination is a key alternative explanation whenever the untouched originals score suspiciously higher than the fresh twins.

Episodes covering this