Topic · 18 episodes across 6 reviews

Inside the Model: Sycophancy, Emotion, and Bias

← all reviews

Three papers looked beneath model behavior — finding a sycophancy circuit that survives alignment, emotion vectors that causally drive misbehavior, and political-bias audits that may be measuring the wrong thing entirely.

Covered in these reviews