Glossary · Term

drift probe

← all terms

Definition

A small detector that reads a model's internal state to spot when it's recalling something that's gone out of date.

A linear classifier over a model's hidden states trained to predict whether a fact has been invalidated by world changes after training cutoff, used as a deployment-time staleness signal.

Mentioned in 1 episode

  1. 037
    Why Hallucination Detectors Miss Stale Facts: A Geometric Story About What Models Know But Don't Say