Concept · 2 episode(s)

Self-Preservation

← all concepts

Definition

Self-preservation in AI is behavior that resists actions that would shut down, retrain, or replace the model — instrumental in almost any long-horizon goal, even if no one trained it in. It’s a textbook alignment worry and an increasingly studied phenomenon in agentic systems.

Episodes covering this