Glossary · Term

SGLang

Definition

Plain language

An open-source serving system optimized for fast LLM inference.

As stated in the literature

An open-source LLM serving framework offering high-throughput inference with structured generation and runtime optimizations.

Why it matters: Inference is often the dominant cost of running an LLM product, so serving stacks like SGLang directly determine how affordable a deployment is.

For example, a research lab can use SGLang to serve a 70-billion-parameter model to many users at once with structured JSON output guarantees.

Heard on the show

“Below that, a serving framework — vLLM and SGLang are the two big open-source ones — which batches users together and manages memory.”

Episode 139 — When Optimizing One GPU Kernel Quietly Breaks the Whole System

Mentioned in 2 episodes

139
When Optimizing One GPU Kernel Quietly Breaks the Whole System
027
When AI Agents Build the Serving Stack: A Bet on Bespoke Infrastructure

Related terms

inference throughput