Definition
An improved tool for peeking at what a model is computing layer by layer.
A trained variant of the logit lens that learns per-layer projections to the unembedding, giving cleaner intermediate-prediction read-outs than the raw projection.
An improved tool for peeking at what a model is computing layer by layer.
A trained variant of the logit lens that learns per-layer projections to the unembedding, giving cleaner intermediate-prediction read-outs than the raw projection.