Definition
Reward variance is the noisiness of the reward signal across runs or samples — high variance makes credit assignment harder and policy gradients sloppier. It’s why a lot of effective RL work is really variance-reduction work in disguise.
Episodes covering this
Worth reading next
Papers we haven't done a deep dive on yet, but would recommend on this topic.