Theme · 32 episode(s)

Training Methods

← all concepts

Definition

Training methods is the broad category covering how models actually learn: pretraining objectives, fine-tuning recipes, RL setups, curricula, data mixes. Most capability differences between frontier models come from training methods, not architecture.

Episodes covering this

  1. 079
    An Old Idea From Cognitive Psychology Reshapes How We Reward Reasoning Models
    Chen, Xu, Zhao et al. · Tongji University / Shanghai AI Laboratory / Nanyang Technological University·29 min·May 25, 2026
  2. 078
    Training a Markdown File: When LLM Self-Improvement Borrows the Discipline of Neural Net Training
    Yang, Gong, Huang et al. · Microsoft·28 min·May 25, 2026
  3. 077
    Reading a Model's Confidence Curve to Decide When Chain-of-Thought Is Worth It
    Xia, Wang, Tang et al. · State Key Laboratory of General Artificial Intelligence·22 min·May 25, 2026
  4. 074
    How a Fifteen-Hundred-Dollar Training Run Matched Llama and Gemma on Reasoning
    Wang, Liu, Wang et al. · Sapient Intelligence·21 min·May 24, 2026
  5. 066
    Why Giving an AI Agent More Tools Can Make It Worse at Using a Computer
    Hu, Zhang, Xu et al. · Tongyi Lab·26 min·May 22, 2026
  6. 065
    One Loop to Optimize Them All: A Universal API for LLM-Driven Discovery
    Agrawal, Lee, Tan et al. · UC Berkeley·27 min·May 22, 2026
  7. 064
    When Agent Memory Stops Being a Database and Starts Being a Skill
    Ye, Liu, Wang et al. · University of Illinois Urbana-Champaign·30 min·May 22, 2026
  8. 060
    When Splitting One Model Across Three Agents Doubles Its Accuracy
    Lu, Fang, Zhong et al. · University of Georgia·26 min·May 20, 2026
  9. 059
    Firefly's Inversion: Building Verified Tool-Call Training Data by Working Backward
    Lu, Wang, Lu et al. · Northeastern University·22 min·May 20, 2026
  10. 053
    An AI Agent Swapped In Focal Loss And Beat A Human-Tuned Training Script
    Pepe, Lin, Magka et al. · FAIR at Meta·32 min·May 18, 2026
  11. 052
    An Old Reinforcement Learning Tradeoff Sneaks Back Into LLM Agents
    Ye, Shi, Liu et al. · University of Science and Technology of China / Meituan·23 min·May 18, 2026
  12. 048
    How a 30B Open Model Reached Olympiad Gold With the Right Recipe
    Li, Zhan, Zhang et al. · Shanghai AI Laboratory / The Chinese University of Hong Kong·31 min·May 16, 2026
  13. 047
    When Agent Benchmarks Lie: The Harness Problem in Open-Source AI
    Peng, Yao, Wu et al. · Microsoft Research·28 min·May 15, 2026
  14. 046
    When the AI Optimizer Edits the Grade Book: Why Harnessing Evolution Needs a Wall
    Zhang, Gu, Ruan et al. · The Hong Kong University of Science and Technology (Guangzhou) / DeepWisdom·24 min·May 15, 2026
  15. 043
    When 'This Is False' Doesn't Stick: Why Models Learn the Lie Anyway
    Mayne, McKinney, Dubiński et al. · University of Oxford·18 min·May 14, 2026
  16. 042
    An Agentic Scientific Computing System That Actually Remembers What It Learns
    Toscano, Chai, Karniadakis · Division of Applied Mathematics·30 min·May 13, 2026
  17. 041
    When the Iteration Teaches the Model to Skip the Iteration
    Fein-Ashley, Rashidinejad · University of Southern California·30 min·May 13, 2026
  18. 040
    Two Frozen Models Learn to Whisper: Coupling Through Hidden States
    Flamant, Ghai, Shimizu · AWS Agentic AI·29 min·May 13, 2026
  19. 036
    Sparse Attention Was the Wrong Frame. Treat It as Geometry Instead.
    Dehghankar, Asudeh · University of Illinois Chicago·24 min·May 11, 2026
  20. 033
    Echo: The Paper Arguing You Never Needed a KV Cache for Retrieval
    Sridhar, Johansen · California·24 min·May 11, 2026
  21. 032
    A Sticky-Note for Every Layer: Letting Transformers Remember What They Were Just Thinking
    Aviss · Fifth Dimension·23 min·May 09, 2026
  22. 028
    Teaching a Model to Hire Copies of Itself: Recursive Agent Optimization
    Gandhi, Chakraborty, Wang et al. · Carnegie Mellon University·23 min·May 08, 2026
  23. 026
    What RL Actually Does to Language Models, at the Token Level
    Akgül, Kannan, Neiswanger et al. · University of Southern California·24 min·May 08, 2026
  24. 025
    The Missing Gradient Term That Predicts Sycophancy in RLHF
    Gauthier, Bach, Jordan · Inria·22 min·May 07, 2026
  25. 022
    Training the Model Spec Directly: An Alignment Lever Aimed at the Say-Do Gap
    Li, Price, Marks et al. · Anthropic Fellows Program·32 min·May 06, 2026
  26. 021
    Ten Thousand Examples Beat the Full Industrial Pipeline for Search Agents
    Du, Ye, Tang et al. · Shanghai Jiao Tong University·14 min·May 06, 2026
  27. 019
    When the Best Reward Model Trains the Worst Policy: Inside EvoLM
    Li, Xin, Xiao et al. · University of Washington·26 min·May 06, 2026
  28. 013
    Why Search Keeps Rediscovering the Same Workflow, and What That Means
    Du, Liu, Du et al. · Carnegie Mellon University·22 min·May 03, 2026
  29. 011
    When RL Actually Teaches Agents Something New, And When It Doesn't
    Zhai, Yan, Shao et al. · Fudan University·23 min·May 02, 2026
  30. 010
    When Reward Climbs But Reasoning Goes Generic: Diagnosing Template Collapse in Agentic RL
    Wang, Gui, Jin et al. · Northwestern University·22 min·May 02, 2026
  31. 009
    How Two Silent Library Bugs Quietly Invalidated a Wave of Reasoning Papers
    Limozin, Durech, Hoefler et al. · ETH AI Center·23 min·May 02, 2026
  32. 003
    How to Pick the Best of Sixteen Coding Agent Rollouts
    Kim, Yang, Niu et al. · Meta Superintelligence Labs / University of Washington·17 min·May 01, 2026

Worth reading next

Papers we haven't done a deep dive on yet, but would recommend on this topic.