Definition
A benchmark of realistic coding tasks used to test code-writing models.
A coding benchmark targeting practical Python programming tasks with library usage and integration challenges.
Also called: big-CODE-bench
Mentioned in 2 episodes
013
007