Definition
Speculative decoding uses a small fast “draft” model to propose several next tokens at once, which the large “target” model then verifies in a single forward pass — accepted tokens are free, rejected ones fall back to the slow path. It’s one of the biggest practical wins in modern LLM serving.
Episodes covering this
Worth reading next
Papers we haven't done a deep dive on yet, but would recommend on this topic.