STALE benchmark · Glossary · AI Papers: A Deep Dive

Definition

Plain language

A benchmark that tests whether AI assistants know when old facts about you are no longer true.

As stated in the literature

A benchmark evaluating LLM agents on implicit conflict detection in long-term memory, separating state resolution, premise resistance, and downstream policy adaptation.

Also called: STALE

Why it matters: Long-running AI assistants will increasingly hold conflicting facts about their users, and a benchmark that probes whether they resolve these conflicts is essential for trustworthy personalization.

For example, the user mentioned six months ago they were vegetarian, then last week mentioned starting to eat fish — the benchmark tests whether the assistant updates accordingly when recommending restaurants.

Heard on the show

“The full title — and it's a good one — is "STALE: Can LLM Agents Know When Their Memories Are No Longer Valid?”

Episode 031 — When Your AI Assistant Won't Let Go of Old Facts About You

Mentioned in 1 episode

031
When Your AI Assistant Won't Let Go of Old Facts About You

Related terms

agent implicit conflict long-term memory policy premise resistance state resolution