Topic · 6 episodes across 1 review
Can We Still Watch the Model Think? Oversight and Monitoring
A sobering cluster: the transcript often can't reveal misbehavior, chain-of-thought monitoring fails across languages, models can resist their own training, and more capability sometimes means less trustworthy outputs.