Glossary · Term

halting probe

← all terms

Definition

A small detector that reads a model's internal state to predict whether more thinking will help or hurt.

A learned classifier that reads intermediate hidden states to predict whether additional iterative refinement will improve or degrade a model's answer.

Mentioned in 1 episode

  1. 032
    A Sticky-Note for Every Layer: Letting Transformers Remember What They Were Just Thinking