Definition
A Transformer variant that runs the same layers over and over to think harder, rather than stacking more layers.
A 2018 architecture that applies a single Transformer block recurrently across depth, sharing weights across iterations; foundational reference for depth-recurrent and looped sequence models.