Concept · 3 episode(s)

Speculative Decoding

← all concepts

Definition

Speculative decoding uses a small fast “draft” model to propose several next tokens at once, which the large “target” model then verifies in a single forward pass — accepted tokens are free, rejected ones fall back to the slow path. It’s one of the biggest practical wins in modern LLM serving.

Episodes covering this

Worth reading next

Papers we haven't done a deep dive on yet, but would recommend on this topic.