Definition
A benchmark suite for studying frontier-agent misalignment in long, realistic scenarios.
An evaluation environment for agentic misalignment used as the experimental substrate for studies including peer-preservation in frontier models.
A benchmark suite for studying frontier-agent misalignment in long, realistic scenarios.
An evaluation environment for agentic misalignment used as the experimental substrate for studies including peer-preservation in frontier models.