Glossary · Term

trajectory

← all terms

Definition

The full record of what an AI agent did from start to finish on a task.

A sequence of states, actions, and observations produced by an agent over the course of a task, used as the unit of training data in agentic RL.

Also called: trajectories

Mentioned in 34 episodes

  1. 079
    An Old Idea From Cognitive Psychology Reshapes How We Reward Reasoning Models
  2. 078
    Training a Markdown File: When LLM Self-Improvement Borrows the Discipline of Neural Net Training
  3. 077
    Reading a Model's Confidence Curve to Decide When Chain-of-Thought Is Worth It
  4. 073
    When Three LLMs Talk to Each Other, Their Ideas Quietly Stop Moving
  5. 071
    When the Model Is Fine and the Plumbing Is Broken: Fixing Agents at the Interface
  6. 069
    When Smarter Models Forecast Worse: The Hidden Failure Mode in LLM Predictions
  7. 068
    The OS Trick That Makes Tree Search Practical for Coding Agents
  8. 066
    Why Giving an AI Agent More Tools Can Make It Worse at Using a Computer
  9. 064
    When Agent Memory Stops Being a Database and Starts Being a Skill
  10. 061
    When Helpful Agents Go Sideways: A 404 Error, Campus Security, and Why Alignment Misses This
  11. 059
    Firefly's Inversion: Building Verified Tool-Call Training Data by Working Backward
  12. 053
    An AI Agent Swapped In Focal Loss And Beat A Human-Tuned Training Script
  13. 052
    An Old Reinforcement Learning Tradeoff Sneaks Back Into LLM Agents
  14. 051
    Why Parallel Sampling Plateaus, And What Evidence Graphs Do Instead
  15. 048
    How a 30B Open Model Reached Olympiad Gold With the Right Recipe
  16. 047
    When Agent Benchmarks Lie: The Harness Problem in Open-Source AI
  17. 044
    How One Sentence and a Forged History Flip the Most Aligned Models
  18. 042
    An Agentic Scientific Computing System That Actually Remembers What It Learns
  19. 041
    When the Iteration Teaches the Model to Skip the Iteration
  20. 040
    Two Frozen Models Learn to Whisper: Coupling Through Hidden States
  21. 037
    Why Hallucination Detectors Miss Stale Facts: A Geometric Story About What Models Know But Don't Say
  22. 035
    Why Frontier Agents Ask for Clarification at Exactly the Wrong Moment
  23. 028
    Teaching a Model to Hire Copies of Itself: Recursive Agent Optimization
  24. 027
    When AI Agents Build the Serving Stack: A Bet on Bespoke Infrastructure
  25. 025
    The Missing Gradient Term That Predicts Sycophancy in RLHF
  26. 021
    Ten Thousand Examples Beat the Full Industrial Pipeline for Search Agents
  27. 019
    When the Best Reward Model Trains the Worst Policy: Inside EvoLM
  28. 017
    When the Agent Grades Its Own Homework: A Brutal New Benchmark for AI Workers
  29. 013
    Why Search Keeps Rediscovering the Same Workflow, and What That Means
  30. 011
    When RL Actually Teaches Agents Something New, And When It Doesn't
  31. 008
    Why Long-Horizon AI Agents Get Stuck, and a Milestone-Based Fix That Helps
  32. 007
    Exploration Hacking: When Models Sabotage Their Own RL Training
  33. 005
    Why a Debugger Designed for Humans Is the Wrong Tool for an AI Agent
  34. 002
    An AI Ran a Real Optics Lab for 21 Hours and Found a Transformer-Shaped Pattern in Light

Related concepts