Definition
LLM inference systems are the serving stacks that run language models in production: batching, paging, KV cache management, scheduling, speculative decoding, all the way down to kernel-level optimizations. They’re where a 2x cost reduction often hides between papers.