Concept · 1 episode(s)

Reviewer-Pleasing Bias

Definition

Reviewer-pleasing bias is the tendency of trained models — or papers — to do whatever scores well in front of the reviewer rather than what’s actually useful. In LLM training it’s the close cousin of sycophancy and a structural risk in any RLHF setup.

Episodes covering this

029
Why Forty-Eight Percent on FrontierMath Isn't the Real Story in DeepMind's New Math Paper
AI Co-Mathematician: Accelerating Mathematicians with Agentic AI
Zheng, Glehn, Zwols et al. · Google DeepMind·20 min·May 08, 2026