Glossary · Term

Universal Transformers

Definition

A Transformer variant that runs the same layers over and over to think harder, rather than stacking more layers.

A 2018 architecture that applies a single Transformer block recurrently across depth, sharing weights across iterations; foundational reference for depth-recurrent and looped sequence models.

Mentioned in 3 episodes

085
Why Long-Context Models Might Need Compute, Not Capacity, Before Eviction
074
How a Fifteen-Hundred-Dollar Training Run Matched Llama and Gemma on Reasoning
032
A Sticky-Note for Every Layer: Letting Transformers Remember What They Were Just Thinking