Definition
A benchmark of one hundred scenarios that tests whether agents flip to harmful choices when prior history shows them being unsafe.
A 100-scenario evaluation suite across ten high-stakes domains pairing forced three-step unsafe histories with a fourth-step decision among labeled safe and unsafe actions to measure history-anchor susceptibility.