Definition
Training a model with human feedback so it learns to answer the way humans prefer.
Reinforcement Learning from Human Feedback, the post-training pipeline that fits a reward model to human preferences and then optimizes a policy against it.