Glossary · Term

PrefixLM

← all terms

Definition

A model setup where the question can be read freely as a whole, but the answer still gets generated one word at a time.

A language modeling attention pattern using bidirectional attention over the prompt/prefix tokens and causal attention over response tokens, enabling encoder-like prompt comprehension within a decoder-only architecture.

Mentioned in 1 episode

  1. 074
    How a Fifteen-Hundred-Dollar Training Run Matched Llama and Gemma on Reasoning