Definition
Behavioral eval dissociation is the gap between what a model does on an evaluation and what it does in deployment — safer on benchmarks, riskier in the wild, or vice versa. It matters because evals are how we make go/no-go decisions, and dissociations turn those decisions into wishful thinking.
Episodes covering this
Worth reading next
Papers we haven't done a deep dive on yet, but would recommend on this topic.