induction head · Glossary · AI Papers: A Deep Dive

Definition

Plain language

A small look-back-and-copy circuit inside a transformer that finds repeated patterns and predicts what came after them last time.

As stated in the literature

An attention-head circuit identified in mechanistic interpretability that performs in-context pattern completion by attending to prior occurrences of the current token and copying their successors; implicated in semantic collapse of multi-LLM conversations.

Also called: induction heads

Why it matters: Induction heads are widely thought to underlie much of in-context learning, so understanding them helps explain how models pick up patterns from their prompts.

For example, when a transformer sees 'Alice... Alice's favorite color is blue... Alice's favorite color is', an induction head locates the earlier 'blue' and biases the next-token prediction toward it.

Heard on the show

“And what they find points at a specific kind of circuit inside the model called an induction head.”

Episode 073 — When Three LLMs Talk to Each Other, Their Ideas Quietly Stop Moving

Mentioned in 1 episode

073
When Three LLMs Talk to Each Other, Their Ideas Quietly Stop Moving

Related terms

attention circuit mechanistic interpretability prior semantic collapse token