Glossary · Term

tuned lens

← all terms

Definition

An improved tool for peeking at what a model is computing layer by layer.

A trained variant of the logit lens that learns per-layer projections to the unembedding, giving cleaner intermediate-prediction read-outs than the raw projection.

Mentioned in 1 episode

  1. 018
    Language Models Compute the Rational Move, Then Override It