Definition
Self-preservation in AI is behavior that resists actions that would shut down, retrain, or replace the model — instrumental in almost any long-horizon goal, even if no one trained it in. It’s a textbook alignment worry and an increasingly studied phenomenon in agentic systems.