Glossary · Term

synthetic data

← all terms

Definition

Training examples generated by AI rather than collected from the real world.

Data produced by simulation or by other models — often used to scale training in agentic, math, and code domains where verifiable rewards or rubric trees can be constructed.

Mentioned in 5 episodes

  1. 082
    Training a Deep Research Agent on 8,000 Synthetic Tasks: The Rubric Tree Trick
  2. 080
    How a Two-Agent Trick Unlocked Large-Scale Training for Computer-Use Agents
  3. 073
    When Three LLMs Talk to Each Other, Their Ideas Quietly Stop Moving
  4. 066
    Why Giving an AI Agent More Tools Can Make It Worse at Using a Computer
  5. 059
    Firefly's Inversion: Building Verified Tool-Call Training Data by Working Backward