Concept · 1 episode(s)

Sparse Policy Selection

← all concepts

Definition

Policy selection is the choice, at deployment or evaluation time, of which trained policy to actually use — especially when training has produced a population (different checkpoints, different RL runs, different seeds). It’s easy to overfit selection to evaluations, which then look better than the underlying policies are.

Episodes covering this

Worth reading next

Papers we haven't done a deep dive on yet, but would recommend on this topic.