Glossary · Term

Format Transfer Injection

← all terms

Definition

A test where researchers copy a model's mid-computation state from one prompt into another to see if it changes the answer.

An interpretability intervention that transplants intermediate activations from one prompt format (e.g., rating) into another (e.g., classification) to test whether judgment is portable across output formats.

Also called: FTI

Mentioned in 1 episode

  1. 055
    Why LLM Judges Flip Their Verdicts When You Change the Question Format