Definition
A standard benchmark of grade-school math word problems used to test reasoning.
A dataset of roughly 8,500 grade-school arithmetic and reasoning word problems widely used to evaluate math capabilities of language models.
A standard benchmark of grade-school math word problems used to test reasoning.
A dataset of roughly 8,500 grade-school arithmetic and reasoning word problems widely used to evaluate math capabilities of language models.