Glossary · Term

benchmark contamination

← all terms

Definition

When the answers to a test have leaked into a model's training data, making the score misleading.

The presence of evaluation data in a model's training corpus, inflating apparent benchmark performance and undermining held-out evaluation.

Mentioned in 1 episode

  1. 021
    Ten Thousand Examples Beat the Full Industrial Pipeline for Search Agents