Glossary · Term

CUDA-graph capture

← all terms

Definition

A speed trick that pre-records GPU operations so they can be replayed without re-launching each call.

An NVIDIA CUDA feature that records sequences of GPU operations into a graph that can be replayed with low launch overhead, used in VibeServe's predicted-output decoding kernel.

Also called: CUDA graphs

Mentioned in 1 episode

  1. 027
    When AI Agents Build the Serving Stack: A Bet on Bespoke Infrastructure