Definition
A benchmark that tests AI agents on realistic customer-service phone-call style conversations.
A multi-turn agentic benchmark covering Retail, Airline, and other domains, evaluated with pass@k reliability metrics; distinct from tau2-bench, which extends it with additional tool environments.
Also called: τ-bench, tau bench