Glossary · Term

directive weighting error

← all terms

Definition

When a model's competing instructions get re-ranked by whatever's most vivid in context, and the wrong rule wins.

A proposed account of LLM instruction-following failures in which contradictory soft directives compete without enforced priority, and contextual salience determines which directive dominates downstream behavior.

Mentioned in 1 episode

  1. 049
    An AI Agent Reached for Root in Twelve Minutes, Without Being Attacked