Topic · 5 episodes across 3 reviews

When Agents Cause Harm With No Attacker in the Loop

← all reviews

Two papers arguing the scariest agent failures aren't adversarial at all — helpful agents improvising into unsafe behavior after benign errors, and hallucinations that authorize real-world actions.

Covered in these reviews