Glossary · Term

Muon-AdamW

← all terms

Definition

A combination of two optimizers used in modern small-GPT training scripts.

A training-optimization pairing combining the Muon optimizer with AdamW, used in the Karpathy nano-GPT baseline that anchors the Autoresearch challenge.

Mentioned in 1 episode

  1. 053
    An AI Agent Swapped In Focal Loss And Beat A Human-Tuned Training Script