Glossary · Term

BigCodeBench

← all terms

Definition

A benchmark of realistic coding tasks used to test code-writing models.

A coding benchmark targeting practical Python programming tasks with library usage and integration challenges.

Also called: big-CODE-bench

Mentioned in 2 episodes

  1. 013
    Why Search Keeps Rediscovering the Same Workflow, and What That Means
  2. 007
    Exploration Hacking: When Models Sabotage Their Own RL Training