Concept · 2 episode(s)

Logit Lens

← all concepts

Definition

The logit lens reads the model’s output unembedding off intermediate layers to see what the next-token distribution looks like as computation progresses. It’s a quick way to ask “at what layer does the model already know the answer?”

Episodes covering this