Definition
Reward Variance is the statistical spread in the reward signals observed across different trajectories or rollouts, which directly inflates the variance of value-function estimates during reinforcement learning. Reducing it is critical because high-variance gradient estimates slow convergence and destabilize training; one effective technique is pooling observations from multiple trajectories that share a common state or graph node, which drives down estimation variance proportionally to the size of the pooled group.