temporal contrast · Glossary · AI Papers: A Deep Dive

Definition

Plain language

Training a model by comparing its newer outputs to its older ones, treating the newer as preferred.

As stated in the literature

A self-supervised training signal that constructs preference pairs across training checkpoints, labeling later outputs as preferred over earlier ones to bootstrap rubric learning without external labels.

Why it matters: It provides a cheap, scalable training signal in domains where collecting fresh human preference labels is too slow or expensive.

For example, a model's day-5 outputs can be treated as 'preferred' over its day-1 outputs to bootstrap preference training without any human raters.

Heard on the show

“The trick is called temporal contrast.”

Episode 019 — When the Best Reward Model Trains the Worst Policy: Inside EvoLM

Mentioned in 1 episode

019
When the Best Reward Model Trains the Worst Policy: Inside EvoLM

Related terms

bootstrap checkpoint rubric