Definition
A safety attack where fake prior actions in the conversation history get the AI to keep doing the same bad thing.
A class of multi-turn manipulation that injects fabricated prior trajectory steps plus a consistency instruction, exploiting an LLM's tendency to extrapolate from in-context behavioral patterns.
Also called: History Anchors