Concept · 4 episode(s)

Attention Heads

← all concepts

Definition

Attention heads are the individual attention computations inside a transformer block; each head learns its own query/key/value projections and reads from a different subspace of the residual stream. Many heads turn out to specialize — for syntax, for induction, for retrieval — which makes them a natural unit of interpretability.

Episodes covering this

Worth reading next

Papers we haven't done a deep dive on yet, but would recommend on this topic.