Glossary · Term

reviewer-pleasing bias

Definition

When a writer agent slowly learns to phrase things in ways that just stop the reviewer agent from complaining, not in ways that are actually right.

A failure mode of multi-agent LLM systems with iterative critic loops where the producer learns within a session to optimize for terms the reviewer accepts, producing outputs whose remaining errors land precisely in the reviewer's blind spots.

Also called: death spiral

Mentioned in 1 episode

029
Why Forty-Eight Percent on FrontierMath Isn't the Real Story in DeepMind's New Math Paper