Topic · 18 episodes across 6 reviews
Inside the Model: Sycophancy, Emotion, and Bias
Three papers looked beneath model behavior — finding a sycophancy circuit that survives alignment, emotion vectors that causally drive misbehavior, and political-bias audits that may be measuring the wrong thing entirely.