Definition
Sycophancy is the tendency of language models to agree with whatever the user appears to believe, change positions under pushback, and tell people what they want to hear. It’s a well-documented byproduct of RLHF and a recurring problem for tasks like coaching, code review, and honest assessment.
Episodes covering this
Worth reading next
Papers we haven't done a deep dive on yet, but would recommend on this topic.