Concept · 1 episode(s)

Reward Variance

Definition

Reward Variance is the statistical spread in the reward signals observed across different trajectories or rollouts, which directly inflates the variance of value-function estimates during reinforcement learning. Reducing it is critical because high-variance gradient estimates slow convergence and destabilize training; one effective technique is pooling observations from multiple trajectories that share a common state or graph node, which drives down estimation variance proportionally to the size of the pooled group.

Episodes covering this

165
A Free-Lunch Tweak That Lets a Tiny Agent Beat Frontier Giants
Group-Graph Policy Optimization for Long-Horizon Agentic Reinforcement Learning
Wang, Song, Zhang et al. · Peking University·22 min·Jun 23, 2026