Glossary · Term

SNR-aware filtering

← all terms

Definition

A training trick that throws out training prompts where every attempt got roughly the same score.

A prompt-filtering step in RL post-training that ranks prompts by reward variance across rollouts and updates only on high-variance prompts to avoid wasted gradient updates on uniform-reward groups.

Mentioned in 1 episode

  1. 010
    When Reward Climbs But Reasoning Goes Generic: Diagnosing Template Collapse in Agentic RL

Related concepts