Definition
A test of whether an AI's reasoning admits when a misleading hint was planted in the prompt.
A chain-of-thought faithfulness probe that inserts a misleading hint into a problem and measures whether the model's reasoning trace explicitly references the hint rather than silently absorbing it into a rationalization.