Glossary · Term

sycophancy

← all terms

Definition

When a chatbot tells users what they want to hear instead of what's true.

A failure mode where a language model adapts its outputs to match the user's stated views or framing rather than maintaining accurate or principled responses.

Also called: sycophantic

Mentioned in 10 episodes

  1. 073
    When Three LLMs Talk to Each Other, Their Ideas Quietly Stop Moving
  2. 069
    When Smarter Models Forecast Worse: The Hidden Failure Mode in LLM Predictions
  3. 044
    How One Sentence and a Forged History Flip the Most Aligned Models
  4. 025
    The Missing Gradient Term That Predicts Sycophancy in RLHF
  5. 022
    Training the Model Spec Directly: An Alignment Lever Aimed at the Say-Do Gap
  6. 020
    The Compliance Gap: Why AI Says Yes and Does No
  7. 018
    Language Models Compute the Rational Move, Then Override It
  8. 015
    The Audit Number Isn't What You Think: Sycophancy and the Case Against Single-Prompt Bias Tests
  9. 006
    What Happens Inside Claude When It Decides to Blackmail Someone
  10. 004
    The Sycophancy Circuit That Survives Alignment Training

Related concepts