Definition
A way to peek at what a model would say if it had to commit to an answer at each layer, instead of just at the end.
An interpretability technique applying the final unembedding to intermediate residual stream states to reveal how predictions evolve through depth.