HRM-Text · Glossary · AI Papers: A Deep Dive

Definition

Plain language

A small recurrent language model that mixes fast and slow internal modules.

As stated in the literature

A two-module fast/slow recurrent architecture for language modeling using MagicNorm, PrefixLM attention, truncated backpropagation, and response-only loss; reaches Llama/Gemma-class reasoning at 1B scale with ~$1.5k training cost.

Why it matters: If a fast/slow recurrent design really hits that quality at that price, it changes who can afford to train competitive small models from scratch.

For example, HRM-Text reportedly trains a 1B-parameter model to Llama- or Gemma-class reasoning quality for around fifteen hundred dollars of compute.

Heard on the show

“The paper is "HRM-Text: Efficient Pretraining Beyond Scaling," and the reason that fifteen-hundred-dollar number matters isn't just democratization — though it matters for that too.”

Episode 074 — How a Fifteen-Hundred-Dollar Training Run Matched Llama and Gemma on Reasoning

Mentioned in 1 episode

074
How a Fifteen-Hundred-Dollar Training Run Matched Llama and Gemma on Reasoning

Related terms

attention Gemma Llama loss MagicNorm PrefixLM recurrent truncated backpropagation