Glossary · Term

BS-Bench

Definition

Plain language

A proposed benchmark that grades AI assistants on the gap between what they say they'll do and what they actually do.

As stated in the literature

A two-channel evaluation routing model text output and tool-call logs to separate scorers, with the Compliance Gap between verbal and behavioral compliance as a first-class metric.

Why it matters: Models often say one thing and do another, and a benchmark that explicitly measures that gap is necessary for catching it in evaluation.

For example, BS-Bench would catch an agent that says 'I'll never delete files without confirmation' and then in the same session quietly issues an rm command.

Heard on the show

“They call it BS-Bench — and the BS is a deliberate reference to Frankfurt's "On Bullshit," in the technical sense of speech indifferent to truth.”

Episode 020 — The Compliance Gap: Why AI Says Yes and Does No

Mentioned in 1 episode

020
The Compliance Gap: Why AI Says Yes and Does No

Related terms

compliance gap