Concept · 1 episode(s)

LLM Inference Systems

Definition

LLM inference systems are the serving stacks that run language models in production: batching, paging, KV cache management, scheduling, speculative decoding, all the way down to kernel-level optimizations. They’re where a 2x cost reduction often hides between papers.

Episodes covering this

005
Why a Debugger Designed for Humans Is the Wrong Tool for an AI Agent
Empowering Autonomous Debugging Agents with Efficient Dynamic Analysis
Xiang, Xu, Chu et al. · Southern University of Science and Technology·22 min·May 01, 2026