Definition
A training trick that throws out training prompts where every attempt got roughly the same score.
A prompt-filtering step in RL post-training that ranks prompts by reward variance across rollouts and updates only on high-variance prompts to avoid wasted gradient updates on uniform-reward groups.