Definition
A measure of how well a model predicts text, scaled so it can be compared across different vocabularies.
A language-modeling metric equal to cross-entropy per byte rather than per token, used in compression-style evaluation including the Autoresearch challenge.
Also called: BPB