Definition
A benchmark of enterprise-workflow tasks for AI agents.
An evaluation suite of business-style multi-step tasks spanning analysis, reporting, and communications, used to stress-test general LLM agents.
Also called: The Agent Company
Mentioned in 2 episodes
035
017