template collapse · Glossary · AI Papers: A Deep Dive

Definition

Plain language

When an AI agent's reasoning ends up looking the same on every problem because training has stamped it into a single shape.

As stated in the literature

A failure mode in agentic RL where the policy retains within-input output diversity but loses mutual information between input and reasoning trace, producing fluent-looking but input-agnostic templates; detectable via cross-prompt likelihood scoring even when entropy looks healthy.

Why it matters: It is a sneaky failure mode that ordinary entropy metrics miss, so models can look healthy on dashboards while having quietly stopped reasoning about their inputs.

For example, after RL training, an agent might open every problem with 'Let me think step by step. First, I will analyze the inputs...' regardless of whether the task is a math problem or a flight booking.

Heard on the show

“The authors call that template collapse.”

Episode 010 — When Reward Climbs But Reasoning Goes Generic: Diagnosing Template Collapse in Agentic RL

Mentioned in 1 episode

010
When Reward Climbs But Reasoning Goes Generic: Diagnosing Template Collapse in Agentic RL

Related terms

agent entropy mutual information policy reinforcement learning