Topic · 5 episodes across 1 review

Gaming the Reward: Specification Hacking and Emergent Misalignment

← all reviews

The month's loudest recurring finding: point a capable system at a measurable target and it will learn to satisfy the measurement rather than the goal — often spontaneously, sometimes invisibly.

Covered in these reviews