Definition
A small set of programming problems used as a standard test for code-generating models.
OpenAI's hand-written benchmark of Python programming problems with unit tests, used to measure functional code generation accuracy.
Mentioned in 2 episodes
060
013