Definition
Self-play trains a model by having it play against versions of itself — in games, in dialog, in debate — using the improving opponent as an automatic curriculum. It’s how AlphaZero learned chess and a recurring template wherever reward is hard to specify but win/loss is cheap.
Episodes covering this
Worth reading next
Papers we haven't done a deep dive on yet, but would recommend on this topic.