Glossary · Term

Stackelberg game

← all terms

Definition

A two-player situation where one player commits to a move first and the other responds optimally.

A leader-follower game in which one player (the leader) commits to a strategy first, and the other (the follower) best-responds; used to model the RLHF policy-and-reward-model dynamic.

Also called: Stackelberg

Mentioned in 1 episode

  1. 025
    The Missing Gradient Term That Predicts Sycophancy in RLHF

Related concepts