Definition
Scaling laws are empirical regularities in how loss falls as a function of parameters, data, and compute — the equations that justify spending another order of magnitude on training. They’ve been a remarkably reliable planning tool and a remarkably awkward one when they break.
Episodes covering this
Worth reading next
Papers we haven't done a deep dive on yet, but would recommend on this topic.
- The Platonic Representation Hypothesis
- Scaling LLM Test-Time Compute Optimally Can be More Effective than Scaling Model Parameters
- Mamba: Linear-Time Sequence Modeling with Selective State Spaces
- Neural Architecture Search with Reinforcement Learning
- Calibration of Large Language Models Using Their Generations
- Are Emergent Abilities of Large Language Models a Mirage?
- Inverse Scaling: When Bigger Isn't Better
- Universal Transformers