Glossary · Term

TheAgentCompany

Definition

Plain language

A benchmark of enterprise-workflow tasks for AI agents.

As stated in the literature

An evaluation suite of business-style multi-step tasks spanning analysis, reporting, and communications, used to stress-test general LLM agents.

Also called: The Agent Company

Why it matters: It evaluates whether general LLM agents can actually carry out the kinds of office tasks vendors keep promising they'll automate.

For example, an agent might be asked to read a quarterly sales spreadsheet, write a memo summarizing trends, and email it to the right team.

Heard on the show

“The other two benchmarks — TheAgentCompany, which is the enterprise workflow one, and swee-Bench Pro, the code repair one — show the same patterns but messier.”

Episode 035 — Why Frontier Agents Ask for Clarification at Exactly the Wrong Moment

Mentioned in 2 episodes

035
Why Frontier Agents Ask for Clarification at Exactly the Wrong Moment
017
When the Agent Grades Its Own Homework: A Brutal New Benchmark for AI Workers

Related terms

agent