Theme · 35 episode(s)

AI Efficiency & Cost

Definition

AI efficiency covers techniques for reducing the compute, memory, energy, or latency of AI systems at a given capability level — quantization, distillation, sparsity, better serving stacks, smarter scheduling. As models get more useful, efficiency increasingly determines what’s deployable rather than just what’s possible.

Episodes covering this

201
One in Four NeurIPS Papers Cites a Reference That Doesn't Exist
Phantom References: Hallucinated Citations That Survive Peer Review at Top-Tier Conferences
Russinovich, Kumar, Salem · Microsoft·19 min·Jul 06, 2026
193
Freeze Most of the Network: Where RL Improvement Actually Lives in a Transformer
Is One Layer Enough? Training A Single Transformer Layer Can Match Full-Parameter RL Training
Zhang, Hu, Glentis et al. · University of Minnesota·22 min·Jul 02, 2026
192
A 32B Open Model Matched Frontier Systems By Learning to Take Notes
AutoMem: Automated Learning of Memory as a Cognitive Skill
Wu, Zhu, Zhang et al. · Stanford University·22 min·Jul 02, 2026
179
How DeepSeek Made One User Faster Without Slowing Down the Crowd
DSpark: Confidence-Scheduled Speculative Decoding with
XinCheng, XingkaiYu, ChenzeShao et al. · Peking University / DeepSeek-AI·23 min·Jun 27, 2026
170
When a One-Liner Beats Your Agent's Clever Verification Logic
Bayesian control for coding agents
Papamarkou, Smirnov, Mazanov et al. · PolyShape / National Technical University of Athens·26 min·Jun 24, 2026
169
Why Better Bug Reports Can Make AI Coding Agents Worse
SHERLOC: Structured Diagnostic Localization for Code Repair Agents
Tamoyan, Narenthiran, Arakelyan et al. · NVIDIA / TU Darmstadt·24 min·Jun 24, 2026
166
A Router That Beats the Frontier Models It Calls
Sakana Fugu Technical Report
Tang, Cetin, Xu et al. · Sakana AI·26 min·Jun 23, 2026
165
A Free-Lunch Tweak That Lets a Tiny Agent Beat Frontier Giants
Group-Graph Policy Optimization for Long-Horizon Agentic Reinforcement Learning
Wang, Song, Zhang et al. · Peking University·22 min·Jun 23, 2026
154
How a 7B Model Out-Investigates a 72B One by Choosing What to Look At
Native Active Perception as Reasoning for Omni-Modal Understanding
Xing, Xu, Wang et al. · The Chinese University of Hong Kong·21 min·Jun 18, 2026
142
Training a Tiny Model to Run the Plumbing Between an Agent and the World
HarnessBridge: Learnable Bidirectional Controller for LLM Agent Harness
Wang, Wang, Taylor et al. · University of California·24 min·Jun 12, 2026
141
How Two Tokens Reopened a Reasoning Method the Field Had Given Up On
Demystifying Hidden-State Recurrence: Switchable Latent Reasoning with On-Policy Reinforcement Learning
Yang, Chen, Wu et al. · HKUST(GZ)·29 min·Jun 12, 2026
130
Why AI Agents Coordinate Better Through a Shared Board Than a Boss
Decentralized Multi-Agent Systems with Shared Context
Mao, Mirhoseini · Stanford University·34 min·Jun 11, 2026
127
What Diffusion Language Models Were Missing: A Map, Not an Algorithm
TextLDM: Language Modeling with Continuous Latent Diffusion
Jiang, Ren, Li et al. · JoyFuture Academy / HIT·30 min·Jun 11, 2026
119
Beating Reinforcement Learning Without Ever Touching the Model's Weights
Agentic Monte Carlo: Simulating Reinforcement Learning for Black-Box Agents
Hwang, Suri, Villecroze et al. · Layer6 AI·22 min·Jun 05, 2026
117
How an Open AI System Verified 672 Hard Math Proofs for Under $300
Goedel-Architect: Streamlining Formal Theorem Proving with Blueprint Generation and Refinement
Chung, Cai, Li et al. · Princeton University·26 min·Jun 05, 2026
116
Why Streaming Half a Reasoning Chain Beats Sending the Whole Thing
Streaming Communication in Multi-Agent Reasoning
Yang, Xu, Wang et al. · HKUST (GZ)·26 min·Jun 04, 2026
115
Teaching a Phone Agent to Reason Silently, And Keeping It Honest
MIRAGE: Mobile Agents with Implicit Reasoning and Generative World Models
Yang, Hu, Hao et al. · Beihang University·24 min·Jun 04, 2026
106
Giving Agents a Notebook Instead of New Weights: How ExpGraph Lets Frozen Models Learn
ExpGraph: Model-Agnostic Experience Learning with Graph-Structured Memory for LLM Agents
Feng, Ye, Luo et al. · University of Illinois Urbana-Champaign·26 min·Jun 02, 2026
100
How a Prompt Wrapper Lets a Frontier Model Play Poker Like an Expert
PokerSkill: LLMs Can Play Expert-Level Poker without Training or Solvers
Li, Wang, Huang · IIIS·29 min·May 29, 2026
085
Why Long-Context Models Might Need Compute, Not Capacity, Before Eviction
Language Models Need Sleep
Lee, McLeish, Goldstein et al. · Carnegie Mellon University·24 min·May 26, 2026
077
Reading a Model's Confidence Curve to Decide When Chain-of-Thought Is Worth It
When Do LLMs Reason? A Dynamical Systems View via Entropy Phase Transitions
Xia, Wang, Tang et al. · State Key Laboratory of General Artificial Intelligence·22 min·May 25, 2026
074
How a Fifteen-Hundred-Dollar Training Run Matched Llama and Gemma on Reasoning
HRM-Text: Efficient Pretraining Beyond Scaling
Wang, Liu, Wang et al. · Sapient Intelligence·21 min·May 24, 2026
071
When the Model Is Fine and the Plumbing Is Broken: Fixing Agents at the Interface
Adapting the Interface, Not the Model: Runtime Harness Adaptation for Deterministic LLM Agents
Xu, Wen, Li · Peking University·23 min·May 22, 2026
063
Why Web Agents Are Slow: A Compiler-Style Fix for Computer-Use Latency
Agent JIT Compilation for Latency-Optimizing Web Agent Planning and Scheduling
Winston, Wang, Mirhoseini et al. · Stanford University·26 min·May 21, 2026
053
An AI Agent Swapped In Focal Loss And Beat A Human-Tuned Training Script
Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design
Pepe, Lin, Magka et al. · FAIR at Meta·32 min·May 18, 2026
051
Why Parallel Sampling Plateaus, And What Evidence Graphs Do Instead
Argus: Evidence Assembly for Scalable Deep Research Agents
Zhang, Su, Chen et al. · MiroMind AI·22 min·May 18, 2026
041
When the Iteration Teaches the Model to Skip the Iteration
Solve the Loop: Attractor Models for Language and Reasoning
Fein-Ashley, Rashidinejad · University of Southern California·30 min·May 13, 2026
040
Two Frozen Models Learn to Whisper: Coupling Through Hidden States
The Bicameral Model: Bidirectional Hidden-State Coupling Between Parallel Language Models
Flamant, Ghai, Shimizu · AWS Agentic AI·29 min·May 13, 2026
036
Sparse Attention Was the Wrong Frame. Treat It as Geometry Instead.
Sparse Attention as a Range Searching Problem: Towards an Inference-Efficient Index for KV Cache
Dehghankar, Asudeh · University of Illinois Chicago·24 min·May 11, 2026
033
Echo: The Paper Arguing You Never Needed a KV Cache for Retrieval
Echo: KV-Cache-Free Associative Recall with Spectral Koopman Operators
Sridhar, Johansen · California·24 min·May 11, 2026
032
A Sticky-Note for Every Layer: Letting Transformers Remember What They Were Just Thinking
State Stream Transformer (SST) V2: Parallel Training of Nonlinear Recurrence for Latent Space Reasoning
Aviss · Fifth Dimension·23 min·May 09, 2026
028
Teaching a Model to Hire Copies of Itself: Recursive Agent Optimization
Recursive Agent Optimization
Gandhi, Chakraborty, Wang et al. · Carnegie Mellon University·23 min·May 08, 2026
027
When AI Agents Build the Serving Stack: A Bet on Bespoke Infrastructure
VibeServe: Can AI Agents Build Bespoke LLM Serving Systems?
Kamahori, Li, Peter et al. · University of Washington·30 min·May 08, 2026
026
What RL Actually Does to Language Models, at the Token Level
Rethinking RL for LLM Reasoning: It's Sparse Policy Selection, Not Capability Learning
Akgül, Kannan, Neiswanger et al. · University of Southern California·24 min·May 08, 2026
005
Why a Debugger Designed for Humans Is the Wrong Tool for an AI Agent
Empowering Autonomous Debugging Agents with Efficient Dynamic Analysis
Xiang, Xu, Chu et al. · Southern University of Science and Technology·22 min·May 01, 2026