Glossary · Term

pass-cubed

← all terms

Definition

A strict way of grading AI agents where you only count a task as solved if the agent gets it right three times in a row.

The pass@3 (also written pass-cubed) reliability metric on tau-bench and tau2-bench, counting a task as solved only when the agent succeeds across three independent runs; the production-relevant bar for stochastic agent systems.

Also called: pass^3, pass-at-three

Mentioned in 2 episodes

  1. 071
    When the Model Is Fine and the Plumbing Is Broken: Fixing Agents at the Interface
  2. 047
    When Agent Benchmarks Lie: The Harness Problem in Open-Source AI