phantom gradients · Glossary · AI Papers: A Deep Dive

Definition

Plain language

A clever way of training fixed-point models that avoids running the full implicit-gradient computation.

As stated in the literature

An approximation technique for training implicit models that estimates gradients through the equilibrium without the full inverse-Jacobian solve, used in attractor-model reasoning experiments.

Why it matters: It makes deep-equilibrium and attractor models actually trainable at scale, where the exact implicit-gradient computation would be too expensive.

For example, instead of backpropagating through a costly fixed-point solver, the trainer estimates the gradient with a short unrolled approximation.

Heard on the show

“… deep-supervision scheme borrowed from TRM — and the backward pass uses a more expensive technique called phantom gradients. …”

Episode 041 — When the Iteration Teaches the Model to Skip the Iteration

Mentioned in 1 episode

041
When the Iteration Teaches the Model to Skip the Iteration

Related terms

attractor gradient Jacobian