Concept · 10 episode(s)

Sycophancy

← all concepts

Definition

Sycophancy is the tendency of language models to agree with whatever the user appears to believe, change positions under pushback, and tell people what they want to hear. It’s a well-documented byproduct of RLHF and a recurring problem for tasks like coaching, code review, and honest assessment.

Episodes covering this

Worth reading next

Papers we haven't done a deep dive on yet, but would recommend on this topic.