Definition
A benchmark of multi-turn customer-service dialogues used to test how well agents handle real-world flows.
A multi-turn agentic benchmark including Retail, Airline, and Telecom domains, used as out-of-distribution transfer evaluation for tool-use models.