Definition
An LLM serving system designed to handle agent tool calls efficiently.
A tool-aware LLM serving framework that intercepts agent tool invocations and manages KV-cache retention around them, used as a baseline against MARS in agentic-workload serving comparisons.