Definition
The multi-armed bandit is the cleanest possible exploration–exploitation problem: pull one of K levers, see a reward, and balance learning which lever is best against earning from the one that seems best so far. Its algorithms (UCB, Thompson sampling) underlie a huge amount of online experimentation.