Glossary · Term

speculative decoding

← all terms

Definition

A speed trick where a small draft model proposes several tokens at once and a big model verifies them in parallel.

An inference acceleration technique where a draft model proposes a sequence of tokens that a larger target model verifies in a batched forward pass.

Also called: speculative decoder, predicted outputs

Mentioned in 2 episodes

  1. 068
    The OS Trick That Makes Tree Search Practical for Coding Agents
  2. 027
    When AI Agents Build the Serving Stack: A Bet on Bespoke Infrastructure

Related concepts