Glossary · Term

Continuum

← all terms

Definition

An LLM serving system that tries to be smart about what to keep in GPU memory when an agent pauses to call a tool.

A tool-aware LLM serving framework that makes KV-cache retention decisions at the moment a tool call begins, used as a baseline against MARS in agentic-workload serving comparisons.

Mentioned in 1 episode

  1. 016
    Why Your Coding Agent Stalls While the GPU Runs Hot