Glossary · Term

TerminalBench

← all terms

Definition

A benchmark of hard command-line tasks for agentic systems.

A command-line task benchmark for AI agents covering operations like file recovery, system administration, and shell-driven problem solving.

Also called: TerminalBench 2.0, Terminal-Bench

Mentioned in 1 episode

  1. 084
    Terminal Agents Get Free Supervision From The Tokens We've Been Throwing Away