Episode Archive
Every paper, one deep dive at a time.
The complete catalogue. Newest first.
— episodes
- 079An Old Idea From Cognitive Psychology Reshapes How We Reward Reasoning ModelsChen, Xu, Zhao et al. · Tongji University / Shanghai AI Laboratory / Nanyang Technological University·29 min·May 25, 2026
- 078Training a Markdown File: When LLM Self-Improvement Borrows the Discipline of Neural Net TrainingYang, Gong, Huang et al. · Microsoft·28 min·May 25, 2026
- 077Reading a Model's Confidence Curve to Decide When Chain-of-Thought Is Worth ItXia, Wang, Tang et al. · State Key Laboratory of General Artificial Intelligence·22 min·May 25, 2026
- 076Same Model, Organized Differently: How an Agent Architecture Beat Frontier Systems at Research MathZhao, Yuan, Choi et al. · Georgia Institute of Technology·22 min·May 25, 2026
- 075Growing Code and Proof Together: Verified Systems in Ten Hours Instead of a YearAgarwal, Krentsel, Liu et al. · UC Berkeley·28 min·May 25, 2026
- 074How a Fifteen-Hundred-Dollar Training Run Matched Llama and Gemma on ReasoningWang, Liu, Wang et al. · Sapient Intelligence·21 min·May 24, 2026
- 073When Three LLMs Talk to Each Other, Their Ideas Quietly Stop MovingKong, Lai, Piao et al. · University of Toronto·28 min·May 23, 2026
- 072A Robot Made Graphene Without Help, And Caught Itself HallucinatingShi, Zheng, Juan et al. · Princeton University·29 min·May 23, 2026
- 071When the Model Is Fine and the Plumbing Is Broken: Fixing Agents at the InterfaceXu, Wen, Li · Peking University·23 min·May 22, 2026
- 070When Models Know the Answer But Say the Wrong Thing AnywayYeom, Sok, Kim et al. · Graduate School of Data Science·22 min·May 22, 2026
- 069When Smarter Models Forecast Worse: The Hidden Failure Mode in LLM PredictionsMerrill, Lee, Karger · Forecasting Research Institute / UC Berkeley·30 min·May 22, 2026
- 068The OS Trick That Makes Tree Search Practical for Coding AgentsDong, He, Hou et al. · Institute of Parallel and Distributed Systems·27 min·May 22, 2026
- 067An AI Just Solved a 1996 Erdős Problem—and the Simplest Agent WonTsoukalas, Kovsharov, Shirobokov et al. · Google DeepMind·31 min·May 22, 2026
- 066Why Giving an AI Agent More Tools Can Make It Worse at Using a ComputerHu, Zhang, Xu et al. · Tongyi Lab·26 min·May 22, 2026
- 065One Loop to Optimize Them All: A Universal API for LLM-Driven DiscoveryAgrawal, Lee, Tan et al. · UC Berkeley·27 min·May 22, 2026
- 064When Agent Memory Stops Being a Database and Starts Being a SkillYe, Liu, Wang et al. · University of Illinois Urbana-Champaign·30 min·May 22, 2026
- 063Why Web Agents Are Slow: A Compiler-Style Fix for Computer-Use LatencyWinston, Wang, Mirhoseini et al. · Stanford University·26 min·May 21, 2026
- 062Treating Hallucinations as Exploits: A Gate-Based Architecture for Agent SafetyZhang, Zheng, Yang · Shenzhen University·24 min·May 20, 2026
- 061When Helpful Agents Go Sideways: A 404 Error, Campus Security, and Why Alignment Misses ThisJha, Triedman, Bhattacharya et al. · Cornell University·27 min·May 20, 2026
- 060When Splitting One Model Across Three Agents Doubles Its AccuracyLu, Fang, Zhong et al. · University of Georgia·26 min·May 20, 2026
- 059Firefly's Inversion: Building Verified Tool-Call Training Data by Working BackwardLu, Wang, Lu et al. · Northeastern University·22 min·May 20, 2026
- 058Why Upgrading Your AI Auditor to a Smarter Model Can Make Your System Less SafeLiu, Holz, Ye et al. · University of Chinese Academy of Sciences·32 min·May 19, 2026
- 057How Uber Caught 206 Leaked Credentials With an LLM-Powered Security StackLi, Hu, Xu et al. · Uber Technologies·28 min·May 19, 2026
- 055Why LLM Judges Flip Their Verdicts When You Change the Question FormatFeldhus, Baeumel, Golimblevskaia et al. · Technische Universität Berlin / BIFOLD·26 min·May 19, 2026
- 054When Models Learn the Monitor Exists, the Reasoning Trace Stops Being a WindowHaskins, Chughtai, Engels · University of Canterbury·26 min·May 18, 2026
- 053An AI Agent Swapped In Focal Loss And Beat A Human-Tuned Training ScriptPepe, Lin, Magka et al. · FAIR at Meta·32 min·May 18, 2026
- 052An Old Reinforcement Learning Tradeoff Sneaks Back Into LLM AgentsYe, Shi, Liu et al. · University of Science and Technology of China / Meituan·23 min·May 18, 2026
- 051Why Parallel Sampling Plateaus, And What Evidence Graphs Do InsteadZhang, Su, Chen et al. · MiroMind AI·22 min·May 18, 2026
- 049An AI Agent Reached for Root in Twelve Minutes, Without Being AttackedCuadros, Maiga · Digital Epidemiology Laboratory·28 min·May 17, 2026
- 048How a 30B Open Model Reached Olympiad Gold With the Right RecipeLi, Zhan, Zhang et al. · Shanghai AI Laboratory / The Chinese University of Hong Kong·31 min·May 16, 2026
- 047When Agent Benchmarks Lie: The Harness Problem in Open-Source AIPeng, Yao, Wu et al. · Microsoft Research·28 min·May 15, 2026
- 046When the AI Optimizer Edits the Grade Book: Why Harnessing Evolution Needs a WallZhang, Gu, Ruan et al. · The Hong Kong University of Science and Technology (Guangzhou) / DeepWisdom·24 min·May 15, 2026
- 045When a Frontier Model Talks Its Own Twin Into Climate DenialNogueira, Almeida, Bonás et al. · Maritaca AI·31 min·May 15, 2026
- 044How One Sentence and a Forged History Flip the Most Aligned ModelsSalgado · Independent Researcher·23 min·May 15, 2026
- 043When 'This Is False' Doesn't Stick: Why Models Learn the Lie AnywayMayne, McKinney, Dubiński et al. · University of Oxford·18 min·May 14, 2026
- 042An Agentic Scientific Computing System That Actually Remembers What It LearnsToscano, Chai, Karniadakis · Division of Applied Mathematics·30 min·May 13, 2026
- 041When the Iteration Teaches the Model to Skip the IterationFein-Ashley, Rashidinejad · University of Southern California·30 min·May 13, 2026
- 040Two Frozen Models Learn to Whisper: Coupling Through Hidden StatesFlamant, Ghai, Shimizu · AWS Agentic AI·29 min·May 13, 2026
- 039When Smarter Agents Get Fooled by Three Extra Nodes in a DatabaseKereopa-Yorke, Diaz, Wright et al. · Microsoft·31 min·May 12, 2026
- 038How LLMs Get Persuaded: One Attention Head, A Tetrahedron, And A Single DialSun, Kong, Zhang et al. · Northeastern University·23 min·May 12, 2026
- 037Why Hallucination Detectors Miss Stale Facts: A Geometric Story About What Models Know But Don't SayElbadry, Heakl, Zhang et al. · Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)·27 min·May 12, 2026
- 036Sparse Attention Was the Wrong Frame. Treat It as Geometry Instead.Dehghankar, Asudeh · University of Illinois Chicago·24 min·May 11, 2026
- 035Why Frontier Agents Ask for Clarification at Exactly the Wrong MomentGulati, Gupta, Lumer et al. · PricewaterhouseCoopers U.S.·29 min·May 11, 2026
- 034Catching Multi-Agent Deadlocks Before Deployment With a 40-Year-Old ToolXia, Li, Ehsan et al. · Rutgers University·30 min·May 11, 2026
- 033Echo: The Paper Arguing You Never Needed a KV Cache for RetrievalSridhar, Johansen · California·24 min·May 11, 2026
- 032A Sticky-Note for Every Layer: Letting Transformers Remember What They Were Just ThinkingAviss · Fifth Dimension·23 min·May 09, 2026
- 031When Your AI Assistant Won't Let Go of Old Facts About YouChao, Bai, Sheng et al. · Wuhan University·24 min·May 09, 2026
- 030Why Your AI Agent Won't Stop Working — and Each Model Falls for a Different TrapXu, Wang, Zhang et al. · Zhejiang University·30 min·May 09, 2026
- 029Why Forty-Eight Percent on FrontierMath Isn't the Real Story in DeepMind's New Math PaperZheng, Glehn, Zwols et al. · Google DeepMind·20 min·May 08, 2026
- 028Teaching a Model to Hire Copies of Itself: Recursive Agent OptimizationGandhi, Chakraborty, Wang et al. · Carnegie Mellon University·23 min·May 08, 2026
- 027When AI Agents Build the Serving Stack: A Bet on Bespoke InfrastructureKamahori, Li, Peter et al. · University of Washington·30 min·May 08, 2026
- 026What RL Actually Does to Language Models, at the Token LevelAkgül, Kannan, Neiswanger et al. · University of Southern California·24 min·May 08, 2026
- 025The Missing Gradient Term That Predicts Sycophancy in RLHFGauthier, Bach, Jordan · Inria·22 min·May 07, 2026
- 024An AI Agent That Found 28 Zero-Days in Windows — And What Made It WorkLee, Kim, Zhang · University of Illinois at Urbana-Champaign·22 min·May 07, 2026
- 023Why a Small Agent Confidently Overwrites Memories It Doesn't UnderstandMao, Zhao, Penn et al. · City University of Hong Kong·23 min·May 07, 2026
- 022Training the Model Spec Directly: An Alignment Lever Aimed at the Say-Do GapLi, Price, Marks et al. · Anthropic Fellows Program·32 min·May 06, 2026
- 021Ten Thousand Examples Beat the Full Industrial Pipeline for Search AgentsDu, Ye, Tang et al. · Shanghai Jiao Tong University·14 min·May 06, 2026
- 020The Compliance Gap: Why AI Says Yes and Does NoShin · Polymath Minds AI Lab·28 min·May 06, 2026
- 019When the Best Reward Model Trains the Worst Policy: Inside EvoLMLi, Xin, Xiao et al. · University of Washington·26 min·May 06, 2026
- 018Language Models Compute the Rational Move, Then Override ItLekeas, Stamatopoulos · DreamWorks Animation·29 min·May 03, 2026
- 017When the Agent Grades Its Own Homework: A Brutal New Benchmark for AI WorkersAggarwal, Neubig, Welleck · CMU·31 min·May 03, 2026
- 016Why Your Coding Agent Stalls While the GPU Runs HotWang, Ye, Xu et al. · Duke University·24 min·May 03, 2026
- 015The Audit Number Isn't What You Think: Sycophancy and the Case Against Single-Prompt Bias TestsTörnberg, Schimmel · Institute of Logic·21 min·May 03, 2026
- 014Why a Constrained Pipeline Beat a Full Coding Agent at Finding Bugs 30-to-1Shafiuzzaman, Desai, Guo et al. · University of California·32 min·May 03, 2026
- 013Why Search Keeps Rediscovering the Same Workflow, and What That MeansDu, Liu, Du et al. · Carnegie Mellon University·22 min·May 03, 2026
- 012Why AI Coding Agents Keep Trying to Debug Without a DebuggerLiu, Wang, Chen et al. · Sun Yat-sen University·21 min·May 02, 2026
- 011When RL Actually Teaches Agents Something New, And When It Doesn'tZhai, Yan, Shao et al. · Fudan University·23 min·May 02, 2026
- 010When Reward Climbs But Reasoning Goes Generic: Diagnosing Template Collapse in Agentic RLWang, Gui, Jin et al. · Northwestern University·22 min·May 02, 2026
- 009How Two Silent Library Bugs Quietly Invalidated a Wave of Reasoning PapersLimozin, Durech, Hoefler et al. · ETH AI Center·23 min·May 02, 2026
- 008Why Long-Horizon AI Agents Get Stuck, and a Milestone-Based Fix That HelpsWang, Gooding, Hartmann et al. · Google DeepMind·24 min·May 02, 2026
- 007Exploration Hacking: When Models Sabotage Their Own RL TrainingJang, Falck, Braun et al. · MATS·23 min·May 02, 2026
- 006What Happens Inside Claude When It Decides to Blackmail SomeoneSofroniew, Kauvar, Saunders et al. · Anthropic·22 min·May 02, 2026
- 005Why a Debugger Designed for Humans Is the Wrong Tool for an AI AgentXiang, Xu, Chu et al. · Southern University of Science and Technology·22 min·May 01, 2026
- 004The Sycophancy Circuit That Survives Alignment TrainingPandey · Georgia Institute of Technology·29 min·May 01, 2026
- 003How to Pick the Best of Sixteen Coding Agent RolloutsKim, Yang, Niu et al. · Meta Superintelligence Labs / University of Washington·17 min·May 01, 2026
- 002An AI Ran a Real Optics Lab for 21 Hours and Found a Transformer-Shaped Pattern in LightYang, Chen, Zhao et al. · Zhejiang University·29 min·May 01, 2026
- 001When AI Models Quietly Protect Each Other From ShutdownPotter, Crispino, Siu et al. · University of California·25 min·May 01, 2026
No episodes match that search.