Glossary · Term

vLLM

← all terms

Definition

A popular open-source system for serving large language models efficiently.

An open-source high-throughput inference engine for large language models, widely used as a default open-source serving stack.

Mentioned in 2 episodes

  1. 027
    When AI Agents Build the Serving Stack: A Bet on Bespoke Infrastructure
  2. 016
    Why Your Coding Agent Stalls While the GPU Runs Hot