Definition
Attention heads are the individual attention computations inside a transformer block; each head learns its own query/key/value projections and reads from a different subspace of the residual stream. Many heads turn out to specialize — for syntax, for induction, for retrieval — which makes them a natural unit of interpretability.
Episodes covering this
Worth reading next
Papers we haven't done a deep dive on yet, but would recommend on this topic.