Glossary · Term

test-time scaling

Definition

Plain language

Getting better answers by spending more compute when running the model, not by training a bigger one.

As stated in the literature

Strategies that improve model performance at inference by allocating extra compute — more samples, longer chains of thought, or iterative refinement — rather than scaling parameters.

Also called: TTS

Why it matters: It changes the economics of model improvement: instead of training ever-bigger models, you pay more per query to get more capability where it matters.

For example, a model might generate 32 candidate solutions to a math problem and have a verifier pick the best one, instead of being trained to be larger.

Heard on the show

“First, test-time scaling — the most practical one.”

Episode 173 — The Free Step-Level Grader Hiding in Every RL Training Run

Mentioned in 3 episodes

173
The Free Step-Level Grader Hiding in Every RL Training Run
048
How a 30B Open Model Reached Olympiad Gold With the Right Recipe
003
How to Pick the Best of Sixteen Coding Agent Rollouts

Related terms

chain of thought inference parameter