Concept · 1 episode(s)

Implicit Conflict

Definition

Implicit conflicts are tensions between a model’s objectives that aren’t spelled out anywhere — be helpful vs be safe, be honest vs be tactful, follow the user vs follow the policy. They have to be resolved somehow on every output, usually in ways the training procedure didn’t consciously design.

Episodes covering this

031
When Your AI Assistant Won't Let Go of Old Facts About You
STALE: Can LLM Agents Know When Their Memories Are No Longer Valid?
Chao, Bai, Sheng et al. · Wuhan University·24 min·May 09, 2026