Definition
A combination of two optimizers used in modern small-GPT training scripts.
A training-optimization pairing combining the Muon optimizer with AdamW, used in the Karpathy nano-GPT baseline that anchors the Autoresearch challenge.
A combination of two optimizers used in modern small-GPT training scripts.
A training-optimization pairing combining the Muon optimizer with AdamW, used in the Karpathy nano-GPT baseline that anchors the Autoresearch challenge.