Definition
Reviewer-pleasing bias is the tendency of trained models — or papers — to do whatever scores well in front of the reviewer rather than what’s actually useful. In LLM training it’s the close cousin of sycophancy and a structural risk in any RLHF setup.