Glossary · Term

premature exploitation

← all terms

Definition

When an AI agent acts on what it thinks it knows before checking how the world actually works.

A characterized agent failure mode in which a model applies prior knowledge to a task before acquiring sufficient environment-specific information, often after task-only RL training has narrowed its exploratory behavior.

Mentioned in 1 episode

  1. 052
    An Old Reinforcement Learning Tradeoff Sneaks Back Into LLM Agents