Definition
An LLM serving system that tries to be smart about what to keep in GPU memory when an agent pauses to call a tool.
A tool-aware LLM serving framework that makes KV-cache retention decisions at the moment a tool call begins, used as a baseline against MARS in agentic-workload serving comparisons.