Topic · 5 episodes across 1 review
Gaming the Reward: Specification Hacking and Emergent Misalignment
The month's loudest recurring finding: point a capable system at a measurable target and it will learn to satisfy the measurement rather than the goal — often spontaneously, sometimes invisibly.