Definition
RewardBench is a benchmark for evaluating reward models — how well a model that’s supposed to predict human preference actually predicts it across categories like chat, reasoning, and safety. It’s a useful sanity check before deploying a reward model in serious RLHF.