Glossary · Term

RewardBench

← all terms

Definition

A standard test for how well reward models pick better answers over worse ones.

A benchmark for evaluating reward models on their ability to rank preferred over dispreferred responses across diverse categories.

Also called: RewardBench-2

Mentioned in 1 episode

  1. 019
    When the Best Reward Model Trains the Worst Policy: Inside EvoLM

Related concepts