Definition
A reinforcement-learning recipe for long-horizon agents that uses milestone progress as a dense reward signal.
A subgoal-driven RL post-training method that trains a 'potential critic' on interpolated subgoal-completion targets and uses its temporal differences as a potential-based shaping reward alongside the binary task reward.