Glossary · Term

Infercept

← all terms

Definition

An LLM serving system designed to handle agent tool calls efficiently.

A tool-aware LLM serving framework that intercepts agent tool invocations and manages KV-cache retention around them, used as a baseline against MARS in agentic-workload serving comparisons.

Mentioned in 1 episode

  1. 016
    Why Your Coding Agent Stalls While the GPU Runs Hot