bits-per-byte · Glossary · AI Papers: A Deep Dive

Definition

Plain language

A measure of how well a model predicts text, scaled so it can be compared across different vocabularies.

As stated in the literature

A language-modeling metric equal to cross-entropy per byte rather than per token, used in compression-style evaluation including the Autoresearch challenge.

Also called: BPB

Why it matters: Unlike per-token perplexity, bits-per-byte is comparable across tokenizers, making it the right metric when comparing very different model architectures.

For example, a model with 1.0 bits-per-byte on a text corpus is, on average, predicting each byte using one bit of information beyond random guessing.

Heard on the show

“Both are trying to make a small language model train better — same script, same baseline, the score is something called validation bits-per-byte, where lower is better.”

Episode 095 — Seven Wins to Zero: How Organizing AI Agents Like a Lab Changes the Search

Mentioned in 2 episodes

Related terms

Autoresearch cross-entropy token