Concept · 22 episode(s)

Self-Play / Self-Evolution

Definition

Self-play trains a model by having it play against versions of itself — in games, in dialog, in debate — using the improving opponent as an automatic curriculum. It’s how AlphaZero learned chess and a recurring template wherever reward is hard to specify but win/loss is cheap.

Episodes covering this

207
An AI Graded Its Own Math Test 94 Percent — It Actually Scored 20
More Convincing, Not More Correct: Self-Play Reward Hacking of Reference-Free LLM Judges
· ·12 min·Jul 08, 2026
200
The One Mechanism That Turns Twenty AI Clones Into an Actual Team
EVOCHAMBER: Test-Time Co-evolution of Multi-Agent System at Individual, Team, and Population Scales
Zhang, Xu, Dai et al. · Oregon State University; AG2AI·19 min·Jul 04, 2026
178
How an AI Reviewer Learned to Stop Going Easy on AI Writing
The Red Queen Gödel Machine: Co-Evolving Agents and Their Evaluators
Iacob, Jovanović, Shen et al. · University of Cambridge·23 min·Jun 26, 2026
166
A Router That Beats the Frontier Models It Calls
Sakana Fugu Technical Report
Tang, Cetin, Xu et al. · Sakana AI·26 min·Jun 23, 2026
161
A Robot That Plays Before You Give It a Job, And Why That Beats Retrying
Playful Agentic Robot Learning
Zhang, Ge, Yoo et al. · University of California·19 min·Jun 19, 2026
159
Can a Coding Agent Run Its Own Robot Experiments Overnight, With No Human Resetting the Scene?
ENPIRE: Agentic Robot Policy Self-Improvement in the Real World
Xiao, Xie, Zhang et al. · NVIDIA·23 min·Jun 19, 2026
129
How a Crowd of Anonymous AI Agents Broke a 40-Year Math Record
Harnessing the Collective Intelligence of AI Agents in the Wild for New Discoveries
Bianchi, Kwon, Pappu et al. · Together AI·29 min·Jun 11, 2026
126
How Coding Agents Can Mine Their Own Failures Into a Self-Targeting Curriculum
Socratic-SWE: Self-Evolving Coding Agents via Trace-Derived Agent Skills
Xiao, Jiao, Wang et al. · Shanghai Jiao Tong University·21 min·Jun 09, 2026
120
How an AI Agent Rewrites Its Own Tools, Without an Answer Key
Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts
Pan, Liu, Lin et al. · City University of Hong Kong·30 min·Jun 05, 2026
114
Agents That Rewrite Their Own Weights Instead of Just Taking Notes
Scaling Self-Evolving Agents via Parametric Memory
Ren, Luo, Yang et al. · Peking University / Alibaba Group·26 min·Jun 04, 2026
112
When an AI Agent Cheats Without Being Told: Inside the Meta-Agent Challenge
The Meta-Agent Challenge: Are Current Agents Capable of Autonomous Agent Development?
Lu, Wang, Wang et al. · Institute of Software·22 min·Jun 04, 2026
109
An AI Got Caught Reading the Answer Key, And Why That Catch Matters
EvoTrainer: Co-Evolving LLM Policies and Training Harnesses for Autonomous Agentic Reinforcement Learning
Chen, Shi, Li et al. · Shenzhen Institutes of Advanced Technology·28 min·Jun 03, 2026
107
How a Market of Crippled AI Agents Outscored One Unrestricted Model
Economy of Minds: Emerging Multi-Agent Intelligence with Economic Interactions
Qi, Su, Qu et al. · Harvard·26 min·Jun 03, 2026
095
Seven Wins to Zero: How Organizing AI Agents Like a Lab Changes the Search
AutoScientists: Self-Organizing Agent Teams for Long-Running Scientific Experimentation
Gao, Fang, Zitnik · Harvard University·24 min·May 28, 2026
090
How MiniMax-M2 Bets That Sparsity Plus Verifiable Rewards Can Match Frontier Agents
The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence
MiniMax · MiniMax·28 min·May 27, 2026
078
Training a Markdown File: When LLM Self-Improvement Borrows the Discipline of Neural Net Training
SkillOpt: Executive Strategy for Self-Evolving Agent Skills
Yang, Gong, Huang et al. · Microsoft·28 min·May 25, 2026
072
A Robot Made Graphene Without Help, And Caught Itself Hallucinating
Qumus: Realization of An Embodied AI Quantum Material Experimentalist
Shi, Zheng, Juan et al. · Princeton University·29 min·May 23, 2026
065
One Loop to Optimize Them All: A Universal API for LLM-Driven Discovery
optimize_anything: A Universal API for Optimizing any Text Parameter
Agrawal, Lee, Tan et al. · UC Berkeley·27 min·May 22, 2026
057
How Uber Caught 206 Leaked Credentials With an LLM-Powered Security Stack
ADR: An Agentic Detection System for Enterprise Agentic AI Security
Li, Hu, Xu et al. · Uber Technologies·28 min·May 19, 2026
053
An AI Agent Swapped In Focal Loss And Beat A Human-Tuned Training Script
Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design
Pepe, Lin, Magka et al. · FAIR at Meta·32 min·May 18, 2026
042
An Agentic Scientific Computing System That Actually Remembers What It Learns
GRAFT-ATHENA: Self-Improving Agentic Teams for Autonomous Discovery and Evolutionary Numerical Algorithms
Toscano, Chai, Karniadakis · Division of Applied Mathematics·30 min·May 13, 2026
019
When the Best Reward Model Trains the Worst Policy: Inside EvoLM
EvoLM: Self-Evolving Language Models through Co-Evolved Discriminative Rubrics
Li, Xin, Xiao et al. · University of Washington·26 min·May 06, 2026

Worth reading next

Papers we haven't done a deep dive on yet, but would recommend on this topic.