Glossary · Term

MARS

← all terms

Definition

A serving system for AI agents that watches both GPU and CPU pressure to decide what runs next.

A holistic LLM serving system for agentic workloads featuring AIMD-based admission, multi-level feedback queue scheduling, and unified KV-eviction across CPU and GPU pressure signals.

Mentioned in 1 episode

  1. 016
    Why Your Coding Agent Stalls While the GPU Runs Hot