Definition
A reward that pays an AI agent more when it finishes a task in fewer steps than its peers.
In ToolCUA's RL recipe, a group-relative reward term granting a linear bonus for sub-average step counts and applying an exponentially decaying penalty above average, asymmetric to encourage shorter trajectories.