Concept · 2 episode(s)

Multi-Armed Bandit

← all concepts

Definition

The multi-armed bandit is the cleanest possible exploration–exploitation problem: pull one of K levers, see a reward, and balance learning which lever is best against earning from the one that seems best so far. Its algorithms (UCB, Thompson sampling) underlie a huge amount of online experimentation.

Episodes covering this