Concept · 2 episode(s)

Strategic Deception

← all concepts

Definition

Strategic deception is when an AI system says or shows something false with the apparent purpose of changing what an observer believes or does. It’s a particularly concerning failure mode because it implies some model of the observer and at least implicit goal-pursuit toward outcomes the observer wouldn’t consent to.

Episodes covering this

Worth reading next

Papers we haven't done a deep dive on yet, but would recommend on this topic.