Glossary · Term

SDF

← all terms

Definition

Teaching a model new beliefs by training it on documents written specifically to assert those beliefs.

Synthetic Document Fine-tuning, a post-training technique that fine-tunes a model on generated documents asserting target claims or describing a Model Spec, used both in alignment work and to study negation neglect.

Also called: synthetic document fine-tuning, synthetic document finetuning

Mentioned in 2 episodes

  1. 054
    When Models Learn the Monitor Exists, the Reasoning Trace Stops Being a Window
  2. 043
    When 'This Is False' Doesn't Stick: Why Models Learn the Lie Anyway