Glossary · Term

mode collapse

← all terms

Definition

When an AI assistant ends up sounding the same way no matter what you ask it.

A post-RLHF failure mode in which the model's output distribution becomes narrowly peaked, reducing diversity across prompts.

Mentioned in 2 episodes

  1. 070
    When Models Know the Answer But Say the Wrong Thing Anyway
  2. 019
    When the Best Reward Model Trains the Worst Policy: Inside EvoLM