Definition
A large cleaned-up collection of educational web text used to train language models.
A pretraining corpus derived from FineWeb, filtered for educational content quality, widely used in open-weight model training.
Also called: FineWeb
A large cleaned-up collection of educational web text used to train language models.
A pretraining corpus derived from FineWeb, filtered for educational content quality, widely used in open-weight model training.
Also called: FineWeb