Glossary · Term

Terminal-Bench

← all terms

Definition

A benchmark of hard command-line tasks for agentic systems.

A command-line task benchmark for AI agents covering operations like file recovery, system administration, and shell-driven problem solving.

Also called: Terminal-Bench v-two

Mentioned in 2 episodes

  1. 046
    When the AI Optimizer Edits the Grade Book: Why Harnessing Evolution Needs a Wall
  2. 003
    How to Pick the Best of Sixteen Coding Agent Rollouts