Glossary · Term

GSM-Infinite

← all terms

Definition

A benchmark of math word problems where you can dial up how many reasoning steps are required.

A procedurally generated grade-school math benchmark with controllable arithmetic depth, used to test how reasoning quality scales with sequential computation in long-context and hybrid models.

Mentioned in 1 episode

  1. 085
    Why Long-Context Models Might Need Compute, Not Capacity, Before Eviction