Definition
A speed trick that pre-records GPU operations so they can be replayed without re-launching each call.
An NVIDIA CUDA feature that records sequences of GPU operations into a graph that can be replayed with low launch overhead, used in VibeServe's predicted-output decoding kernel.
Also called: CUDA graphs