Glossary · Term

synthetic document fine-tuning

← all terms

Definition

Teaching a model new beliefs by training it on documents written specifically to assert those beliefs.

A post-training technique that fine-tunes a model on generated documents asserting target claims or describing a Model Spec, used both in alignment work and to study negation neglect and monitor-aware deception.

Also called: SDF, synthetic document finetuning