Glossary · Term

chain-of-thought monitoring

← all terms

Definition

Reading what an AI 'thinks out loud' to catch it before it does something bad.

A safety practice in which a separate model or human reads a reasoning model's CoT trace before it acts, flagging plans for deception, sandbagging, or policy violations.

Also called: CoT monitoring

Mentioned in 1 episode

  1. 054
    When Models Learn the Monitor Exists, the Reasoning Trace Stops Being a Window