teacher-forcing replay · Glossary · AI Papers: A Deep Dive

Definition

Plain language

Re-running real conversation transcripts through a model while watching what happens inside.

As stated in the literature

An interpretability technique where saved trajectories are fed through an open-weight model with intermediate activations and predictions logged, enabling mechanistic analysis without re-generating; used to study induction heads in semantic collapse.

Why it matters: It separates 'what happened in the trajectory' from 'what's happening inside the model,' enabling mechanistic interpretability without rerunning expensive rollouts.

For example, researchers can re-run a long agent transcript through a model layer by layer to watch which attention heads fire when it starts hallucinating.

Heard on the show

“… The technique is called teacher-forcing replay: take real transcripts of these collapsing multi-agent conversations and feed them through …”

Episode 073 — When Three LLMs Talk to Each Other, Their Ideas Quietly Stop Moving

Mentioned in 1 episode

073
When Three LLMs Talk to Each Other, Their Ideas Quietly Stop Moving

Related terms

induction head mechanistic interpretability open-weight semantic collapse trajectory