Definition
A serving system for AI agents that watches both GPU and CPU pressure to decide what runs next.
A holistic LLM serving system for agentic workloads featuring AIMD-based admission, multi-level feedback queue scheduling, and unified KV-eviction across CPU and GPU pressure signals.