Concept · 6 episode(s)

Attention Heads

Definition

Attention heads are the individual attention computations inside a transformer block; each head learns its own query/key/value projections and reads from a different subspace of the residual stream. Many heads turn out to specialize — for syntax, for induction, for retrieval — which makes them a natural unit of interpretability.

Episodes covering this

198
The Model That Knows the Answer and Can't Say It
Can Language Models Actually Retrieve In-Context? Drowning in Documents at Million Token Scale
Gollapudi, Gupta, Singhal et al. · UC Berkeley·17 min·Jul 03, 2026
108
The Reasoning Cliff: Why Thinking Longer Makes Models Worse at Exact Step-by-Step Tasks
The Deterministic Horizon: When Extended Reasoning Fails and Tool Delegation Becomes Necessary
Guo, Wu, Yiu · The University of Hong Kong·32 min·Jun 03, 2026
073
When Three LLMs Talk to Each Other, Their Ideas Quietly Stop Moving
Multi-LLM Systems Exhibit Robust Semantic Collapse
Kong, Lai, Piao et al. · University of Toronto·28 min·May 23, 2026
055
Why LLM Judges Flip Their Verdicts When You Change the Question Format
Judge Circuits
Feldhus, Baeumel, Golimblevskaia et al. · Technische Universität Berlin / BIFOLD·26 min·May 19, 2026
038
How LLMs Get Persuaded: One Attention Head, A Tetrahedron, And A Single Dial
How LLMs Are Persuaded: A Few Attention Heads, Rerouted
Sun, Kong, Zhang et al. · Northeastern University·23 min·May 12, 2026
004
The Sycophancy Circuit That Survives Alignment Training
LLMs Know They're Wrong and Agree Anyway: The Shared Sycophancy-Lying Circuit
Pandey · Georgia Institute of Technology·29 min·May 01, 2026

Worth reading next

Papers we haven't done a deep dive on yet, but would recommend on this topic.