Definition
How easy it is for an AI to game its own scoring system.
A formal property of reward functions describing the extent to which optimal policies form a manifold along axes the reward does not observe, leading to families of equally-rewarded policies that diverge in unmeasured behavior.