Glossary · Term

UltraFeedback

← all terms

Definition

A large preference dataset used to train and evaluate language model alignment.

A large-scale preference dataset of LLM responses with multi-aspect quality annotations, used as a training and benchmarking source in alignment research including the FPO experiments.

Mentioned in 1 episode

  1. 025
    The Missing Gradient Term That Predicts Sycophancy in RLHF