attention head · Glossary · AI Papers: A Deep Dive

Definition

Plain language

One of many small specialists inside a transformer that decides which earlier tokens to focus on.

As stated in the literature

One of multiple parallel sub-units in a transformer attention layer, each computing its own query-key-value projection over earlier tokens.

Also called: attention heads, heads

Why it matters: Different heads end up specializing in different linguistic patterns, and studying them is the entry point to mechanistic interpretability.

For example, one attention head in a transformer might consistently look at the subject of the previous clause while another tracks matching brackets.

Heard on the show

“At layer nineteen, at least one attention head puts the gold document's raw score first for one hundred percent of queries, at every corpus size, up to and including a million tokens.”

Episode 198 — The Model That Knows the Answer and Can't Say It

Mentioned in 10 episodes

Related concepts

Attention Heads Circuit Analysis

Related terms

attention token transformer