Glossary · Term

mixture-of-experts

← all terms

Definition

A model design where only a fraction of the parameters fire on any one input, letting models be very large but cheap to run.

A neural network architecture in which only a sparse subset of expert sub-networks is activated per token, enabling large total parameter counts at lower per-token compute.

Also called: MoE, mixture of experts, sparse mixture-of-experts

Mentioned in 4 episodes

  1. 048
    How a 30B Open Model Reached Olympiad Gold With the Right Recipe
  2. 047
    When Agent Benchmarks Lie: The Harness Problem in Open-Source AI
  3. 028
    Teaching a Model to Hire Copies of Itself: Recursive Agent Optimization
  4. 004
    The Sycophancy Circuit That Survives Alignment Training