Glossary · Term

BS-Bench

← all terms

Definition

A proposed benchmark that grades AI assistants on the gap between what they say they'll do and what they actually do.

A two-channel evaluation routing model text output and tool-call logs to separate scorers, with the Compliance Gap between verbal and behavioral compliance as a first-class metric.

Mentioned in 1 episode

  1. 020
    The Compliance Gap: Why AI Says Yes and Does No