UltraFeedback · Glossary · AI Papers: A Deep Dive

Definition

Plain language

A large preference dataset used to train and evaluate language model alignment.

As stated in the literature

A large-scale preference dataset of LLM responses with multi-aspect quality annotations, used as a training and benchmarking source in alignment research including the FPO experiments.

Why it matters: It is one of the most widely used public preference datasets, anchoring a lot of alignment research and reproducible comparisons.

For example, a researcher fine-tuning an open chat model with DPO might use UltraFeedback as the source of preferred-vs-rejected response pairs.

Heard on the show

“500 training iterations on prompts from UltraFeedback.”

Episode 025 — The Missing Gradient Term That Predicts Sycophancy in RLHF

Mentioned in 1 episode

025
The Missing Gradient Term That Predicts Sycophancy in RLHF

Related terms

alignment training FPO