Definition
A benchmark of hard command-line tasks for agentic systems.
A command-line task benchmark for AI agents covering operations like file recovery, system administration, and shell-driven problem solving.
Also called: TerminalBench 2.0, Terminal-Bench