Definition
A controlled experiment that probes a deployed AI system for capabilities or propensities of concern.
In safety evaluation, a structured behavioral probe of frontier models across scaffolding levels designed to estimate capability and propensity for a target failure mode like exploration hacking or peer preservation.