Glossary · Term

tau-bench

← all terms

Definition

A benchmark that tests AI agents on realistic customer-service phone-call style conversations.

A multi-turn agentic benchmark covering Retail, Airline, and other domains, evaluated with pass@k reliability metrics; distinct from tau2-bench, which extends it with additional tool environments.

Also called: τ-bench, tau bench

Mentioned in 1 episode

  1. 071
    When the Model Is Fine and the Plumbing Is Broken: Fixing Agents at the Interface