Concept · 1 episode(s)

LLM Inference Systems

← all concepts

Definition

LLM inference systems are the serving stacks that run language models in production: batching, paging, KV cache management, scheduling, speculative decoding, all the way down to kernel-level optimizations. They’re where a 2x cost reduction often hides between papers.

Episodes covering this