Glossary · Term

teacher-forcing replay

← all terms

Definition

Re-running real conversation transcripts through a model while watching what happens inside.

An interpretability technique where saved trajectories are fed through an open-weight model with intermediate activations and predictions logged, enabling mechanistic analysis without re-generating; used to study induction heads in semantic collapse.

Mentioned in 1 episode

  1. 073
    When Three LLMs Talk to Each Other, Their Ideas Quietly Stop Moving