Glossary · Term

equilibrium internalization

Definition

Plain language

When training an iterative refinement model teaches the underlying network to produce the final answer in one pass, making the refinement unnecessary at inference.

As stated in the literature

An emergent phenomenon in attractor-model training where the high-capacity backbone learns to initialize the residual stream at the fixed point itself, so the equilibrium solver requires few or zero iterations at test time.

Why it matters: If a costly iterative procedure collapses into a single forward pass after training, you get the accuracy benefit without paying the runtime cost in deployment.

For example, a model trained to iteratively refine its answer ten times eventually learns to produce the right answer on the first pass, so the refinement loop can be skipped at inference.

Heard on the show

“The other thing I'd flag, and the authors acknowledge it themselves, is that equilibrium internalization is observed, not explained.”

Episode 041 — When the Iteration Teaches the Model to Skip the Iteration

Mentioned in 1 episode

041
When the Iteration Teaches the Model to Skip the Iteration

Related terms

attractor backbone fixed point residual stream