Glossary · Term

HIL-Bench

← all terms

Definition

A benchmark that tests whether AI assistants ask for help at the right moments.

A human-in-the-loop evaluation suite penalizing both over-clarification and missed escalation in agentic interactions.

Mentioned in 1 episode

  1. 035
    Why Frontier Agents Ask for Clarification at Exactly the Wrong Moment