SNR-aware filtering · Glossary · AI Papers: A Deep Dive

Definition

Plain language

A training trick that throws out training prompts where every attempt got roughly the same score.

As stated in the literature

A prompt-filtering step in RL post-training that ranks prompts by reward variance across rollouts and updates only on high-variance prompts to avoid wasted gradient updates on uniform-reward groups.

Why it matters: Skipping prompts that produce uniform rewards saves a large fraction of training compute while losing almost no learning signal.

For example, if every one of eight rollouts on a prompt gets the same score, the prompt gets dropped from the training step because the gradient would be near zero anyway.

Heard on the show

“They call it SNR-aware filtering.”

Episode 010 — When Reward Climbs But Reasoning Goes Generic: Diagnosing Template Collapse in Agentic RL

Mentioned in 1 episode

010
When Reward Climbs But Reasoning Goes Generic: Diagnosing Template Collapse in Agentic RL

Related concepts

SNR-Aware Filtering

Related terms

gradient post-training reinforcement learning rollout