halting probe · Glossary · AI Papers: A Deep Dive

Definition

Plain language

A small detector that reads a model's internal state to predict whether more thinking will help or hurt.

As stated in the literature

A learned classifier that reads intermediate hidden states to predict whether additional iterative refinement will improve or degrade a model's answer.

Why it matters: Knowing when to stop thinking saves compute and avoids the surprisingly common case where extra deliberation makes an answer worse.

For example, a halting probe might read a model's hidden state after three reasoning iterations and predict that further iterations will start degrading the answer, so the system stops there.

Heard on the show

“Which is exactly what the halting probe is for.”

Episode 032 — A Sticky-Note for Every Layer: Letting Transformers Remember What They Were Just Thinking

Mentioned in 1 episode

032
A Sticky-Note for Every Layer: Letting Transformers Remember What They Were Just Thinking

Related terms

classifier hidden state