Definition
A benchmark of underspecified agent tasks used to study clarification behavior.
A benchmark of deliberately underspecified long-horizon tasks used to study when LLM agents should ask for clarification versus proceed under ambiguity.
A benchmark of underspecified agent tasks used to study clarification behavior.
A benchmark of deliberately underspecified long-horizon tasks used to study when LLM agents should ask for clarification versus proceed under ambiguity.