Concept · 3 episode(s)

Scalable Oversight

← all concepts

Definition

Scalable oversight is the research program of supervising AI systems whose outputs we can’t fully evaluate ourselves — because the model is more capable than the human, or the domain is too complex. Debate, recursive reward modeling, and constitutional AI are all proposed answers.

Episodes covering this

Worth reading next

Papers we haven't done a deep dive on yet, but would recommend on this topic.