ECHO · Glossary · AI Papers: A Deep Dive

Definition

Plain language

A training trick that gets free supervision out of every AI agent run by predicting what the environment will say back.

As stated in the literature

An auxiliary next-token loss applied to environment-produced tokens (e.g., terminal output) during agent RL, providing dense supervision on otherwise-wasted rollouts; layered onto GRPO with a small weight, it roughly doubles pass rates on TerminalBench-style tasks.

Why it matters: It squeezes useful training signal out of environment feedback that would otherwise just be discarded as input tokens.

For example, while an agent is being trained on coding tasks, it also gets a small auxiliary loss for predicting the next characters of the terminal's output, turning every rollout into extra supervision.

Heard on the show

“They cite independent work — ECHO — where simply adding an environment-prediction loss during agent training roughly *doubled* pass rates on a terminal benchmark.”

Episode 167 — How Teaching an AI to Predict, Not Act, Made It a Better Actor

Mentioned in 2 episodes

Related terms

agent GRPO loss reinforcement learning rollout TerminalBench token weights