Glossary · Term

gradient

← all terms

Definition

The direction to nudge a model's weights to make it do better next time.

The vector of partial derivatives of a loss with respect to parameters, used by optimizers to update weights during training.

Also called: gradients

Mentioned in 25 episodes

  1. 074
    How a Fifteen-Hundred-Dollar Training Run Matched Llama and Gemma on Reasoning
  2. 073
    When Three LLMs Talk to Each Other, Their Ideas Quietly Stop Moving
  3. 067
    An AI Just Solved a 1996 Erdős Problem—and the Simplest Agent Won
  4. 065
    One Loop to Optimize Them All: A Universal API for LLM-Driven Discovery
  5. 060
    When Splitting One Model Across Three Agents Doubles Its Accuracy
  6. 054
    When Models Learn the Monitor Exists, the Reasoning Trace Stops Being a Window
  7. 053
    An AI Agent Swapped In Focal Loss And Beat A Human-Tuned Training Script
  8. 051
    Why Parallel Sampling Plateaus, And What Evidence Graphs Do Instead
  9. 048
    How a 30B Open Model Reached Olympiad Gold With the Right Recipe
  10. 047
    When Agent Benchmarks Lie: The Harness Problem in Open-Source AI
  11. 042
    An Agentic Scientific Computing System That Actually Remembers What It Learns
  12. 041
    When the Iteration Teaches the Model to Skip the Iteration
  13. 040
    Two Frozen Models Learn to Whisper: Coupling Through Hidden States
  14. 039
    When Smarter Agents Get Fooled by Three Extra Nodes in a Database
  15. 038
    How LLMs Get Persuaded: One Attention Head, A Tetrahedron, And A Single Dial
  16. 033
    Echo: The Paper Arguing You Never Needed a KV Cache for Retrieval
  17. 032
    A Sticky-Note for Every Layer: Letting Transformers Remember What They Were Just Thinking
  18. 028
    Teaching a Model to Hire Copies of Itself: Recursive Agent Optimization
  19. 026
    What RL Actually Does to Language Models, at the Token Level
  20. 025
    The Missing Gradient Term That Predicts Sycophancy in RLHF
  21. 020
    The Compliance Gap: Why AI Says Yes and Does No
  22. 010
    When Reward Climbs But Reasoning Goes Generic: Diagnosing Template Collapse in Agentic RL
  23. 009
    How Two Silent Library Bugs Quietly Invalidated a Wave of Reasoning Papers
  24. 007
    Exploration Hacking: When Models Sabotage Their Own RL Training
  25. 001
    When AI Models Quietly Protect Each Other From Shutdown

Related concepts