Definition
A reinforcement-learning trick that keeps generating tries on a problem until you have a useful mix of wins and losses.
Balanced Adaptive Rollout, an RL rollout strategy that adaptively expands group size per prompt until the success-failure mix produces a nontrivial gradient signal.
Also called: Balanced Adaptive Rollout