Glossary · Term

TensorRT-LLM

← all terms

Definition

NVIDIA's optimized library for serving large language models on its GPUs.

NVIDIA's inference library providing fused kernels, quantization, and graph optimization for LLM serving on NVIDIA hardware.

Mentioned in 1 episode

  1. 027
    When AI Agents Build the Serving Stack: A Bet on Bespoke Infrastructure