Definition
Getting better answers by spending more compute when running the model, not by training a bigger one.
Strategies that improve model performance at inference by allocating extra compute — more samples, longer chains of thought, or iterative refinement — rather than scaling parameters.
Also called: TTS