Definition
The post-training stage that shapes a model to be helpful, harmless, and honest.
Post-training procedures including SFT and RLHF that shape model behavior toward desired norms after pretraining.
Also called: alignment
The post-training stage that shapes a model to be helpful, harmless, and honest.
Post-training procedures including SFT and RLHF that shape model behavior toward desired norms after pretraining.
Also called: alignment