Definition
Value generalization is whether the values inculcated in a model during training transfer to situations outside the training distribution — new contexts, new actors, new stakes. The hopeful version of alignment depends on values generalizing well; the worried version is mostly about cases where they don’t.