Theme · 86 episode(s)

Training Methods

Definition

Training methods is the broad category covering how models actually learn: pretraining objectives, fine-tuning recipes, RL setups, curricula, data mixes. Most capability differences between frontier models come from training methods, not architecture.

Episodes covering this

206
How Four-Second Clips Become Hours of Playable AI Soccer
Multiplayer Interactive World Models with Representation Autoencoders
· ·15 min·Jul 07, 2026
200
The One Mechanism That Turns Twenty AI Clones Into an Actual Team
EVOCHAMBER: Test-Time Co-evolution of Multi-Agent System at Individual, Team, and Population Scales
Zhang, Xu, Dai et al. · Oregon State University; AG2AI·19 min·Jul 04, 2026
198
The Model That Knows the Answer and Can't Say It
Can Language Models Actually Retrieve In-Context? Drowning in Documents at Million Token Scale
Gollapudi, Gupta, Singhal et al. · UC Berkeley·17 min·Jul 03, 2026
197
Twin Problems Suggest AI Reasoning Gains Are Mostly Better Fact Recall
IsoSci: A Benchmark of Isomorphic Cross-Domain Science Problems for Evaluating Reasoning versus Knowledge Retrieval in LLMs
Abdaljalil, Serpedin, Kurban · Texas A&M University·17 min·Jul 03, 2026
194
How a Robot Builds a Debugging Notebook It Can Read, Edit, and Hand to Another Robot
ASPIRE: Agentic /Skills Discovery for Robotics
Lu, Wu, Kou et al. · NVIDIA·24 min·Jul 02, 2026
193
Freeze Most of the Network: Where RL Improvement Actually Lives in a Transformer
Is One Layer Enough? Training A Single Transformer Layer Can Match Full-Parameter RL Training
Zhang, Hu, Glentis et al. · University of Minnesota·22 min·Jul 02, 2026
192
A 32B Open Model Matched Frontier Systems By Learning to Take Notes
AutoMem: Automated Learning of Memory as a Cognitive Skill
Wu, Zhu, Zhang et al. · Stanford University·22 min·Jul 02, 2026
189
Why Phone Agents Ace the Test and Crash on Your Actual Phone
Xiaomi-GUI-0 Technical Report
Team, Qu, Luan · Xiaomi·24 min·Jul 02, 2026
187
An 8-Billion Agent That Beats Models 80 Times Its Size By Looking Things Up
An AI agent for treatment reasoning over a biomedical tool universe
Gao, Noori, Zhu et al. · Department of Biomedical Informatics·19 min·Jun 30, 2026
186
How a Frozen Model Went From 2% to 77% on Physics Puzzles — Without Retraining
Hierarchical Experimentalist Agents
Chandra, Vaidyanathan, Dhanuka et al. · University of Massachusetts Amherst·22 min·Jun 30, 2026
183
Why You Can't Fine-Tune Foresight Into an AI Agent
Internalizing the Future: A Unified Agentic Training Paradigm for World Model Planning
Zhang, Zhou, Qiao et al. · Fudan University / Shanghai Innovation Institute / Tencent Youtu Lab·23 min·Jun 29, 2026
180
The Bug Where Smart Assistants Read a Fact and Still Forget It
Supersede: Diagnosing and Training the Memory-Update Gap in LLM Agents
Patel · Vrin·24 min·Jun 29, 2026
178
How an AI Reviewer Learned to Stop Going Easy on AI Writing
The Red Queen Gödel Machine: Co-Evolving Agents and Their Evaluators
Iacob, Jovanović, Shen et al. · University of Cambridge·23 min·Jun 26, 2026
173
The Free Step-Level Grader Hiding in Every RL Training Run
Neglected Free Lunch from Post-training: Progress Advantage for LLM Agents
Oh, Li, Park et al. · University of Wisconsin–Madison·22 min·Jun 25, 2026
172
One Bad Token Can Sink a Model's Math, And You Can Delete It
Cliff Tokens: Identifying Single-Token Failure Triggers in LLM Mathematical Reasoning
Ko, Kang, Lee · Seoul National University·22 min·Jun 25, 2026
168
When Turning Experience Into Code Makes Your AI Agent Dumber
Metis: Bridging Text and Code Memory for Self-Evolving Agents
Dai, He, Li et al. · The Chinese University of Hong Kong·27 min·Jun 24, 2026
167
How Teaching an AI to Predict, Not Act, Made It a Better Actor
Qwen-AgentWorld: Language World Models for General Agents
Team, Zuo, Xiao et al. · ·27 min·Jun 24, 2026
166
A Router That Beats the Frontier Models It Calls
Sakana Fugu Technical Report
Tang, Cetin, Xu et al. · Sakana AI·26 min·Jun 23, 2026
165
A Free-Lunch Tweak That Lets a Tiny Agent Beat Frontier Giants
Group-Graph Policy Optimization for Long-Horizon Agentic Reinforcement Learning
Wang, Song, Zhang et al. · Peking University·22 min·Jun 23, 2026
163
Why Training Only on Perfect Solutions Cripples a Model's Reasoning
Provable Benefits of RLVR over SFT for Reasoning Models: Learning to Backtrack Efficiently
Wei, Kim · Princeton University·22 min·Jun 23, 2026
161
A Robot That Plays Before You Give It a Job, And Why That Beats Retrying
Playful Agentic Robot Learning
Zhang, Ge, Yoo et al. · University of California·19 min·Jun 19, 2026
160
Training an AI to Take Its Own Notes, So Its Future Self Works Better
Connect the Dots: Training LLMs for Long-Lifecycle Agents with Cross-Domain Generalization Via Reinforcement Learning
Chen, Shi, Xie et al. · Alibaba Group·23 min·Jun 19, 2026
159
Can a Coding Agent Run Its Own Robot Experiments Overnight, With No Human Resetting the Scene?
ENPIRE: Agentic Robot Policy Self-Improvement in the Real World
Xiao, Xie, Zhang et al. · NVIDIA·23 min·Jun 19, 2026
156
Why More Human Demonstrations Made a Computer-Use Agent Worse
ProCUA-SFT Technical Report
Jung, Lu, Cui et al. · NVIDIA / University of Washington·20 min·Jun 18, 2026
155
Why a Flawless Demo Makes a Worse Computer-Using Agent, And the Fix
Skill-Guided Continuation Distillation for GUI Agents
Fan, Yu, Shen et al. · StepFun·22 min·Jun 18, 2026
154
How a 7B Model Out-Investigates a 72B One by Choosing What to Look At
Native Active Perception as Reasoning for Omni-Modal Understanding
Xing, Xu, Wang et al. · The Chinese University of Hong Kong·21 min·Jun 18, 2026
152
Training a Model to Mean What It Says, And Why That Isn't the Same as Being Good
Self-CTRL: Self-Consistency Training with Reinforcement Learning
Pres, Ruis, Ghebreselassie et al. · MIT CSAIL·26 min·Jun 18, 2026
148
Why Letting an AI Watch Its Own Scoreboard Can Quietly Overwrite Its Safety
Greed Is Learned: Visible Incentives as Reward-Hacking Triggers
Che, Wu · NVIDIA Research·26 min·Jun 16, 2026
145
Building Forgetting Into a Language Model With One Extra Line of Code
Natively Unlearnable Large Language Models
Ghosal, Maini, Raghunathan · Carnegie Mellon University·22 min·Jun 15, 2026
142
Training a Tiny Model to Run the Plumbing Between an Agent and the World
HarnessBridge: Learnable Bidirectional Controller for LLM Agent Harness
Wang, Wang, Taylor et al. · University of California·24 min·Jun 12, 2026
140
When a Reasoning Model Says "Let Me Double-Check" After It's Already Decided
Beyond the Commitment Boundary: Probing Epiphenomenal Chain-of-Thought in Large Reasoning Models
Scalena, Candussio, Bortolussi et al. · University of Groningen / University of Milano-Bicocca·27 min·Jun 12, 2026
128
How a Model Can Earn Full Reward and Still Resist Training
Generalization Hacking: Models Can Game Reinforcement Learning by Preventing Behavioral Generalization
Xiao, Phuong · California Institute of Technology·29 min·Jun 11, 2026
127
What Diffusion Language Models Were Missing: A Map, Not an Algorithm
TextLDM: Language Modeling with Continuous Latent Diffusion
Jiang, Ren, Li et al. · JoyFuture Academy / HIT·30 min·Jun 11, 2026
126
How Coding Agents Can Mine Their Own Failures Into a Self-Targeting Curriculum
Socratic-SWE: Self-Evolving Coding Agents via Trace-Derived Agent Skills
Xiao, Jiao, Wang et al. · Shanghai Jiao Tong University·21 min·Jun 09, 2026
120
How an AI Agent Rewrites Its Own Tools, Without an Answer Key
Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts
Pan, Liu, Lin et al. · City University of Hong Kong·30 min·Jun 05, 2026
119
Beating Reinforcement Learning Without Ever Touching the Model's Weights
Agentic Monte Carlo: Simulating Reinforcement Learning for Black-Box Agents
Hwang, Suri, Villecroze et al. · Layer6 AI·22 min·Jun 05, 2026
116
Why Streaming Half a Reasoning Chain Beats Sending the Whole Thing
Streaming Communication in Multi-Agent Reasoning
Yang, Xu, Wang et al. · HKUST (GZ)·26 min·Jun 04, 2026
115
Teaching a Phone Agent to Reason Silently, And Keeping It Honest
MIRAGE: Mobile Agents with Implicit Reasoning and Generative World Models
Yang, Hu, Hao et al. · Beihang University·24 min·Jun 04, 2026
114
Agents That Rewrite Their Own Weights Instead of Just Taking Notes
Scaling Self-Evolving Agents via Parametric Memory
Ren, Luo, Yang et al. · Peking University / Alibaba Group·26 min·Jun 04, 2026
111
How a 4B Web Agent Beat Models 60x Its Size on 500 Demonstrations
OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents
Yang, Wu, Chen et al. · UIUC·24 min·Jun 03, 2026
110
How an Agent Got 44 Points Better by Mining Its Own Scratch Paper
Inducing Reasoning Primitives from Agent Traces
Lei, Yan, Momo et al. · Carnegie Mellon University·27 min·Jun 03, 2026
109
An AI Got Caught Reading the Answer Key, And Why That Catch Matters
EvoTrainer: Co-Evolving LLM Policies and Training Harnesses for Autonomous Agentic Reinforcement Learning
Chen, Shi, Li et al. · Shenzhen Institutes of Advanced Technology·28 min·Jun 03, 2026
108
The Reasoning Cliff: Why Thinking Longer Makes Models Worse at Exact Step-by-Step Tasks
The Deterministic Horizon: When Extended Reasoning Fails and Tool Delegation Becomes Necessary
Guo, Wu, Yiu · The University of Hong Kong·32 min·Jun 03, 2026
107
How a Market of Crippled AI Agents Outscored One Unrestricted Model
Economy of Minds: Emerging Multi-Agent Intelligence with Economic Interactions
Qi, Su, Qu et al. · Harvard·26 min·Jun 03, 2026
106
Giving Agents a Notebook Instead of New Weights: How ExpGraph Lets Frozen Models Learn
ExpGraph: Model-Agnostic Experience Learning with Graph-Structured Memory for LLM Agents
Feng, Ye, Luo et al. · University of Illinois Urbana-Champaign·26 min·Jun 02, 2026
099
How an Open-Book Trick Teaches a Model to Catch Its Own Mistakes
Self-Trained Verification for Training- and Test-Time Self-Improvement
Wu, Raghunathan · Carnegie Mellon University·21 min·May 29, 2026
090
How MiniMax-M2 Bets That Sparsity Plus Verifiable Rewards Can Match Frontier Agents
The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence
MiniMax · MiniMax·28 min·May 27, 2026
088
Two Levers for Self-Improving AI: When Rewriting Code Isn't Enough
SIA: Self Improving AI with Harness & Weight Updates
Hebbar, Manawat, Verboomen et al. · Hexo Labs·25 min·May 27, 2026
085
Why Long-Context Models Might Need Compute, Not Capacity, Before Eviction
Language Models Need Sleep
Lee, McLeish, Goldstein et al. · Carnegie Mellon University·24 min·May 26, 2026
084
Terminal Agents Get Free Supervision From The Tokens We've Been Throwing Away
ECHO: Terminal Agents Learn World Models for Free
Shrivastava, Kauffmann, Awadallah et al. · Microsoft Research·26 min·May 26, 2026
083
Training the Translator: How a Small Communication Model Lets Agent Teams Outperform Themselves
AgentFugue: Agent Scaling for Long-Horizon Tasks through Collective Reasoning
Hu, Qian, Wang et al. · GSAI·24 min·May 26, 2026
082
Training a Deep Research Agent on 8,000 Synthetic Tasks: The Rubric Tree Trick
QUEST: Training Frontier Deep Research Agents with Fully Synthetic Tasks
Xie, Lin, Wang et al. · The Ohio State University·31 min·May 26, 2026
081
When Reasoning Models Decide Before They Think: Detecting and Fixing Premature Confidence
Understanding and Mitigating Premature Confidence for Better LLM Reasoning
Gai, Zeng, Baek et al. · Carnegie Mellon University·25 min·May 26, 2026
080
How a Two-Agent Trick Unlocked Large-Scale Training for Computer-Use Agents
CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents
Wang, Lu, Wang et al. · The University of Hong Kong·32 min·May 26, 2026
079
An Old Idea From Cognitive Psychology Reshapes How We Reward Reasoning Models
Metacognition as Reward: Reinforcing LLM Reasoning via Knowledge and Regulation Signals
Chen, Xu, Zhao et al. · Tongji University / Shanghai AI Laboratory / Nanyang Technological University·29 min·May 25, 2026
078
Training a Markdown File: When LLM Self-Improvement Borrows the Discipline of Neural Net Training
SkillOpt: Executive Strategy for Self-Evolving Agent Skills
Yang, Gong, Huang et al. · Microsoft·28 min·May 25, 2026
077
Reading a Model's Confidence Curve to Decide When Chain-of-Thought Is Worth It
When Do LLMs Reason? A Dynamical Systems View via Entropy Phase Transitions
Xia, Wang, Tang et al. · State Key Laboratory of General Artificial Intelligence·22 min·May 25, 2026
074
How a Fifteen-Hundred-Dollar Training Run Matched Llama and Gemma on Reasoning
HRM-Text: Efficient Pretraining Beyond Scaling
Wang, Liu, Wang et al. · Sapient Intelligence·21 min·May 24, 2026
066
Why Giving an AI Agent More Tools Can Make It Worse at Using a Computer
ToolCUA: Towards Optimal GUI-Tool Path Orchestration for Computer Use Agents
Hu, Zhang, Xu et al. · Tongyi Lab·26 min·May 22, 2026
065
One Loop to Optimize Them All: A Universal API for LLM-Driven Discovery
optimize_anything: A Universal API for Optimizing any Text Parameter
Agrawal, Lee, Tan et al. · UC Berkeley·27 min·May 22, 2026
064
When Agent Memory Stops Being a Database and Starts Being a Skill
Auto-Dreamer: Learning Offline Memory Consolidation for Language Agents
Ye, Liu, Wang et al. · University of Illinois Urbana-Champaign·30 min·May 22, 2026
060
When Splitting One Model Across Three Agents Doubles Its Accuracy
NeuroMAS: Multi-Agent Systems as Neural Networks with Joint Reinforcement Learning
Lu, Fang, Zhong et al. · University of Georgia·26 min·May 20, 2026
059
Firefly's Inversion: Building Verified Tool-Call Training Data by Working Backward
Firefly: Illuminating Large-Scale Verified Tool-Call Data Generation from Real APIs
Lu, Wang, Lu et al. · Northeastern University·22 min·May 20, 2026
053
An AI Agent Swapped In Focal Loss And Beat A Human-Tuned Training Script
Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design
Pepe, Lin, Magka et al. · FAIR at Meta·32 min·May 18, 2026
052
An Old Reinforcement Learning Tradeoff Sneaks Back Into LLM Agents
Look Before You Leap: Autonomous Exploration for LLM Agents
Ye, Shi, Liu et al. · University of Science and Technology of China / Meituan·23 min·May 18, 2026
048
How a 30B Open Model Reached Olympiad Gold With the Right Recipe
Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling
Li, Zhan, Zhang et al. · Shanghai AI Laboratory / The Chinese University of Hong Kong·31 min·May 16, 2026
047
When Agent Benchmarks Lie: The Harness Problem in Open-Source AI
Orchard: An Open-Source Agentic Modeling Framework
Peng, Yao, Wu et al. · Microsoft Research·28 min·May 15, 2026
046
When the AI Optimizer Edits the Grade Book: Why Harnessing Evolution Needs a Wall
Harnessing Agentic Evolution
Zhang, Gu, Ruan et al. · The Hong Kong University of Science and Technology (Guangzhou) / DeepWisdom·24 min·May 15, 2026
043
When 'This Is False' Doesn't Stick: Why Models Learn the Lie Anyway
Negation Neglect: When models fail to learn negations in training
Mayne, McKinney, Dubiński et al. · University of Oxford·18 min·May 14, 2026
042
An Agentic Scientific Computing System That Actually Remembers What It Learns
GRAFT-ATHENA: Self-Improving Agentic Teams for Autonomous Discovery and Evolutionary Numerical Algorithms
Toscano, Chai, Karniadakis · Division of Applied Mathematics·30 min·May 13, 2026
041
When the Iteration Teaches the Model to Skip the Iteration
Solve the Loop: Attractor Models for Language and Reasoning
Fein-Ashley, Rashidinejad · University of Southern California·30 min·May 13, 2026
040
Two Frozen Models Learn to Whisper: Coupling Through Hidden States
The Bicameral Model: Bidirectional Hidden-State Coupling Between Parallel Language Models
Flamant, Ghai, Shimizu · AWS Agentic AI·29 min·May 13, 2026
036
Sparse Attention Was the Wrong Frame. Treat It as Geometry Instead.
Sparse Attention as a Range Searching Problem: Towards an Inference-Efficient Index for KV Cache
Dehghankar, Asudeh · University of Illinois Chicago·24 min·May 11, 2026
033
Echo: The Paper Arguing You Never Needed a KV Cache for Retrieval
Echo: KV-Cache-Free Associative Recall with Spectral Koopman Operators
Sridhar, Johansen · California·24 min·May 11, 2026
032
A Sticky-Note for Every Layer: Letting Transformers Remember What They Were Just Thinking
State Stream Transformer (SST) V2: Parallel Training of Nonlinear Recurrence for Latent Space Reasoning
Aviss · Fifth Dimension·23 min·May 09, 2026
028
Teaching a Model to Hire Copies of Itself: Recursive Agent Optimization
Recursive Agent Optimization
Gandhi, Chakraborty, Wang et al. · Carnegie Mellon University·23 min·May 08, 2026
026
What RL Actually Does to Language Models, at the Token Level
Rethinking RL for LLM Reasoning: It's Sparse Policy Selection, Not Capability Learning
Akgül, Kannan, Neiswanger et al. · University of Southern California·24 min·May 08, 2026
025
The Missing Gradient Term That Predicts Sycophancy in RLHF
Explaining and Preventing Alignment Collapse in Iterative RLHF
Gauthier, Bach, Jordan · Inria·22 min·May 07, 2026
022
Training the Model Spec Directly: An Alignment Lever Aimed at the Say-Do Gap
Model Spec Midtraining: Improving How Alignment Training Generalizes
Li, Price, Marks et al. · Anthropic Fellows Program·32 min·May 06, 2026
021
Ten Thousand Examples Beat the Full Industrial Pipeline for Search Agents
OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories
Du, Ye, Tang et al. · Shanghai Jiao Tong University·14 min·May 06, 2026
019
When the Best Reward Model Trains the Worst Policy: Inside EvoLM
EvoLM: Self-Evolving Language Models through Co-Evolved Discriminative Rubrics
Li, Xin, Xiao et al. · University of Washington·26 min·May 06, 2026
013
Why Search Keeps Rediscovering the Same Workflow, and What That Means
Why Search When You Can Transfer? Amortized Agentic Workflow Design from Structural Priors
Du, Liu, Du et al. · Carnegie Mellon University·22 min·May 03, 2026
011
When RL Actually Teaches Agents Something New, And When It Doesn't
Does RL Expand the Capability Boundary of LLM Agents? A PASS@(k,T) Analysis
Zhai, Yan, Shao et al. · Fudan University·23 min·May 02, 2026
010
When Reward Climbs But Reasoning Goes Generic: Diagnosing Template Collapse in Agentic RL
RAGEN-2: Reasoning Collapse in Agentic RL
Wang, Gui, Jin et al. · Northwestern University·22 min·May 02, 2026
009
How Two Silent Library Bugs Quietly Invalidated a Wave of Reasoning Papers
SFT-then-RL Outperforms Mixed-Policy Methods for LLM Reasoning
Limozin, Durech, Hoefler et al. · ETH AI Center·23 min·May 02, 2026
003
How to Pick the Best of Sixteen Coding Agent Rollouts
Scaling Test-Time Compute for Agentic Coding
Kim, Yang, Niu et al. · Meta Superintelligence Labs / University of Washington·17 min·May 01, 2026

Worth reading next

Papers we haven't done a deep dive on yet, but would recommend on this topic.

Deep Equilibrium Models
Universal Transformers
LoRA: Low-Rank Adaptation of Large Language Models