Concept · 24 episode(s)

Knowledge Distillation

Definition

Knowledge distillation trains a smaller “student” model to mimic the outputs of a larger “teacher,” producing a much cheaper model that retains a large fraction of the teacher’s capability. It’s the standard way labs convert a flagship model into a deployable lineup.

Episodes covering this

206
How Four-Second Clips Become Hours of Playable AI Soccer
Multiplayer Interactive World Models with Representation Autoencoders
· ·15 min·Jul 07, 2026
200
The One Mechanism That Turns Twenty AI Clones Into an Actual Team
EVOCHAMBER: Test-Time Co-evolution of Multi-Agent System at Individual, Team, and Population Scales
Zhang, Xu, Dai et al. · Oregon State University; AG2AI·19 min·Jul 04, 2026
192
A 32B Open Model Matched Frontier Systems By Learning to Take Notes
AutoMem: Automated Learning of Memory as a Cognitive Skill
Wu, Zhu, Zhang et al. · Stanford University·22 min·Jul 02, 2026
189
Why Phone Agents Ace the Test and Crash on Your Actual Phone
Xiaomi-GUI-0 Technical Report
Team, Qu, Luan · Xiaomi·24 min·Jul 02, 2026
186
How a Frozen Model Went From 2% to 77% on Physics Puzzles — Without Retraining
Hierarchical Experimentalist Agents
Chandra, Vaidyanathan, Dhanuka et al. · University of Massachusetts Amherst·22 min·Jun 30, 2026
168
When Turning Experience Into Code Makes Your AI Agent Dumber
Metis: Bridging Text and Code Memory for Self-Evolving Agents
Dai, He, Li et al. · The Chinese University of Hong Kong·27 min·Jun 24, 2026
163
Why Training Only on Perfect Solutions Cripples a Model's Reasoning
Provable Benefits of RLVR over SFT for Reasoning Models: Learning to Backtrack Efficiently
Wei, Kim · Princeton University·22 min·Jun 23, 2026
156
Why More Human Demonstrations Made a Computer-Use Agent Worse
ProCUA-SFT Technical Report
Jung, Lu, Cui et al. · NVIDIA / University of Washington·20 min·Jun 18, 2026
155
Why a Flawless Demo Makes a Worse Computer-Using Agent, And the Fix
Skill-Guided Continuation Distillation for GUI Agents
Fan, Yu, Shen et al. · StepFun·22 min·Jun 18, 2026
145
Building Forgetting Into a Language Model With One Extra Line of Code
Natively Unlearnable Large Language Models
Ghosal, Maini, Raghunathan · Carnegie Mellon University·22 min·Jun 15, 2026
142
Training a Tiny Model to Run the Plumbing Between an Agent and the World
HarnessBridge: Learnable Bidirectional Controller for LLM Agent Harness
Wang, Wang, Taylor et al. · University of California·24 min·Jun 12, 2026
127
What Diffusion Language Models Were Missing: A Map, Not an Algorithm
TextLDM: Language Modeling with Continuous Latent Diffusion
Jiang, Ren, Li et al. · JoyFuture Academy / HIT·30 min·Jun 11, 2026
126
How Coding Agents Can Mine Their Own Failures Into a Self-Targeting Curriculum
Socratic-SWE: Self-Evolving Coding Agents via Trace-Derived Agent Skills
Xiao, Jiao, Wang et al. · Shanghai Jiao Tong University·21 min·Jun 09, 2026
111
How a 4B Web Agent Beat Models 60x Its Size on 500 Demonstrations
OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents
Yang, Wu, Chen et al. · UIUC·24 min·Jun 03, 2026
106
Giving Agents a Notebook Instead of New Weights: How ExpGraph Lets Frozen Models Learn
ExpGraph: Model-Agnostic Experience Learning with Graph-Structured Memory for LLM Agents
Feng, Ye, Luo et al. · University of Illinois Urbana-Champaign·26 min·Jun 02, 2026
099
How an Open-Book Trick Teaches a Model to Catch Its Own Mistakes
Self-Trained Verification for Training- and Test-Time Self-Improvement
Wu, Raghunathan · Carnegie Mellon University·21 min·May 29, 2026
083
Training the Translator: How a Small Communication Model Lets Agent Teams Outperform Themselves
AgentFugue: Agent Scaling for Long-Horizon Tasks through Collective Reasoning
Hu, Qian, Wang et al. · GSAI·24 min·May 26, 2026
082
Training a Deep Research Agent on 8,000 Synthetic Tasks: The Rubric Tree Trick
QUEST: Training Frontier Deep Research Agents with Fully Synthetic Tasks
Xie, Lin, Wang et al. · The Ohio State University·31 min·May 26, 2026
078
Training a Markdown File: When LLM Self-Improvement Borrows the Discipline of Neural Net Training
SkillOpt: Executive Strategy for Self-Evolving Agent Skills
Yang, Gong, Huang et al. · Microsoft·28 min·May 25, 2026
066
Why Giving an AI Agent More Tools Can Make It Worse at Using a Computer
ToolCUA: Towards Optimal GUI-Tool Path Orchestration for Computer Use Agents
Hu, Zhang, Xu et al. · Tongyi Lab·26 min·May 22, 2026
047
When Agent Benchmarks Lie: The Harness Problem in Open-Source AI
Orchard: An Open-Source Agentic Modeling Framework
Peng, Yao, Wu et al. · Microsoft Research·28 min·May 15, 2026
041
When the Iteration Teaches the Model to Skip the Iteration
Solve the Loop: Attractor Models for Language and Reasoning
Fein-Ashley, Rashidinejad · University of Southern California·30 min·May 13, 2026
017
When the Agent Grades Its Own Homework: A Brutal New Benchmark for AI Workers
Gym-Anything: Turn any Software into an Agent Environment
Aggarwal, Neubig, Welleck · CMU·31 min·May 03, 2026
013
Why Search Keeps Rediscovering the Same Workflow, and What That Means
Why Search When You Can Transfer? Amortized Agentic Workflow Design from Structural Priors
Du, Liu, Du et al. · Carnegie Mellon University·22 min·May 03, 2026