Glossary · Term

diversity reward

Definition

Plain language

A training signal that pays the model for saying something new.

As stated in the literature

An RL reward shaping term penalizing semantic similarity to recent and anchor outputs, used in the semantic-collapse paper as a direct test of whether collapse can be undone by optimization.

Why it matters: It's a direct intervention against the mode collapse where fine-tuned models keep producing the same kinds of answers in the same kinds of words.

For example, during training, a model gets a small bonus whenever its new completion is semantically distant from its last few outputs on similar prompts.

Related terms

reinforcement learning reward shaping