Glossary · Term

Universal Transformers

← all terms

Definition

A Transformer variant that runs the same layers over and over to think harder, rather than stacking more layers.

A 2018 architecture that applies a single Transformer block recurrently across depth, sharing weights across iterations; foundational reference for depth-recurrent and looped sequence models.

Mentioned in 3 episodes

  1. 085
    Why Long-Context Models Might Need Compute, Not Capacity, Before Eviction
  2. 074
    How a Fifteen-Hundred-Dollar Training Run Matched Llama and Gemma on Reasoning
  3. 032
    A Sticky-Note for Every Layer: Letting Transformers Remember What They Were Just Thinking