Glossary · Term

LHAW

Definition

Plain language

A benchmark of underspecified agent tasks used to study clarification behavior.

As stated in the literature

A benchmark of deliberately underspecified long-horizon tasks used to study when LLM agents should ask for clarification versus proceed under ambiguity.

Why it matters: It tests a behavior most benchmarks ignore — knowing when to stop and ask rather than guessing — which matters enormously for trustworthy agents.

For example, the user asks the agent to 'book a flight to SF next week' without specifying the day, budget, or airline, and the benchmark scores whether the agent asks before booking.

Heard on the show

“For listeners who want to go further, the underlying benchmark — LHAW, from Pu and colleagues earlier this year — provides the underspecified task variants this paper builds on.”

Episode 035 — Why Frontier Agents Ask for Clarification at Exactly the Wrong Moment

Mentioned in 1 episode

035
Why Frontier Agents Ask for Clarification at Exactly the Wrong Moment

Related terms

agent