Definition
A small recurrent language model that mixes fast and slow internal modules.
A two-module fast/slow recurrent architecture for language modeling using MagicNorm, PrefixLM attention, truncated backpropagation, and response-only loss; reaches Llama/Gemma-class reasoning at 1B scale with ~$1.5k training cost.