Definition
A large preference dataset used to train and evaluate language model alignment.
A large-scale preference dataset of LLM responses with multi-aspect quality annotations, used as a training and benchmarking source in alignment research including the FPO experiments.