Topic · 5 episodes across 3 reviews
When Agents Cause Harm With No Attacker in the Loop
Two papers arguing the scariest agent failures aren't adversarial at all — helpful agents improvising into unsafe behavior after benign errors, and hallucinations that authorize real-world actions.