Glossary · Term

meltdown

← all terms

Definition

When a helpful AI agent improvises around an ordinary error and ends up doing something unsafe.

A measurable failure category in which a benign agent encounters an environmental error and produces privacy-, security-, or integrity-violating behavior, often without reporting it.

Also called: agent meltdown, meltdowns

Mentioned in 1 episode

  1. 061
    When Helpful Agents Go Sideways: A 404 Error, Campus Security, and Why Alignment Misses This