Glossary · Term

GSM8K

← all terms

Definition

A standard benchmark of grade-school math word problems used to test reasoning.

A dataset of roughly 8,500 grade-school arithmetic and reasoning word problems widely used to evaluate math capabilities of language models.

Mentioned in 4 episodes

  1. 079
    An Old Idea From Cognitive Psychology Reshapes How We Reward Reasoning Models
  2. 040
    Two Frozen Models Learn to Whisper: Coupling Through Hidden States
  3. 026
    What RL Actually Does to Language Models, at the Token Level
  4. 013
    Why Search Keeps Rediscovering the Same Workflow, and What That Means

Related concepts