Theme · 15 episode(s)

AI Efficiency & Cost

← all concepts

Definition

AI efficiency covers techniques for reducing the compute, memory, energy, or latency of AI systems at a given capability level — quantization, distillation, sparsity, better serving stacks, smarter scheduling. As models get more useful, efficiency increasingly determines what’s deployable rather than just what’s possible.

Episodes covering this

  1. 077
    Reading a Model's Confidence Curve to Decide When Chain-of-Thought Is Worth It
    Xia, Wang, Tang et al. · State Key Laboratory of General Artificial Intelligence·22 min·May 25, 2026
  2. 074
    How a Fifteen-Hundred-Dollar Training Run Matched Llama and Gemma on Reasoning
    Wang, Liu, Wang et al. · Sapient Intelligence·21 min·May 24, 2026
  3. 071
    When the Model Is Fine and the Plumbing Is Broken: Fixing Agents at the Interface
    Xu, Wen, Li · Peking University·23 min·May 22, 2026
  4. 063
    Why Web Agents Are Slow: A Compiler-Style Fix for Computer-Use Latency
    Winston, Wang, Mirhoseini et al. · Stanford University·26 min·May 21, 2026
  5. 053
    An AI Agent Swapped In Focal Loss And Beat A Human-Tuned Training Script
    Pepe, Lin, Magka et al. · FAIR at Meta·32 min·May 18, 2026
  6. 051
    Why Parallel Sampling Plateaus, And What Evidence Graphs Do Instead
    Zhang, Su, Chen et al. · MiroMind AI·22 min·May 18, 2026
  7. 041
    When the Iteration Teaches the Model to Skip the Iteration
    Fein-Ashley, Rashidinejad · University of Southern California·30 min·May 13, 2026
  8. 040
    Two Frozen Models Learn to Whisper: Coupling Through Hidden States
    Flamant, Ghai, Shimizu · AWS Agentic AI·29 min·May 13, 2026
  9. 036
    Sparse Attention Was the Wrong Frame. Treat It as Geometry Instead.
    Dehghankar, Asudeh · University of Illinois Chicago·24 min·May 11, 2026
  10. 033
    Echo: The Paper Arguing You Never Needed a KV Cache for Retrieval
    Sridhar, Johansen · California·24 min·May 11, 2026
  11. 032
    A Sticky-Note for Every Layer: Letting Transformers Remember What They Were Just Thinking
    Aviss · Fifth Dimension·23 min·May 09, 2026
  12. 028
    Teaching a Model to Hire Copies of Itself: Recursive Agent Optimization
    Gandhi, Chakraborty, Wang et al. · Carnegie Mellon University·23 min·May 08, 2026
  13. 027
    When AI Agents Build the Serving Stack: A Bet on Bespoke Infrastructure
    Kamahori, Li, Peter et al. · University of Washington·30 min·May 08, 2026
  14. 026
    What RL Actually Does to Language Models, at the Token Level
    Akgül, Kannan, Neiswanger et al. · University of Southern California·24 min·May 08, 2026
  15. 005
    Why a Debugger Designed for Humans Is the Wrong Tool for an AI Agent
    Xiang, Xu, Chu et al. · Southern University of Science and Technology·22 min·May 01, 2026