The field, as covered by the show.
This is a periodic snapshot of the AI research field as covered by paperdive.ai — one episode, one paper, synthesised into a moving picture of where the work is going. Because the field moves fast, older findings here may appear only as brief context or be omitted where the consensus has shifted past them. Citations like E032 link to the relevant episode page, where the close reading and discussion live.
Agentic systems and tool use
Agents are no longer a model story — they are a stack of harness, tool, memory, and search decisions, and most of the recent progress lives at those seams rather than in the weights.
Alignment, safety, and adversarial robustness
Safety has fragmented into several distinct failure modes — propensity vs capability, persuasion across trust boundaries, ambient meltdowns — and the field is starting to architect around them rather than train them away.
Reinforcement learning and post-training for reasoning
The picture of what RL actually does to language models is sharpening — and it looks far smaller, more targeted, and more easily replicated than the AlphaGo-style narrative suggested.
Mechanistic interpretability and internal state
A growing body of work is reading specific behaviours — sycophancy, persuasion, cooperation, judgment, staleness — off of localised circuits, with practical consequences for how to monitor and steer models.
Evaluation, measurement, and what we are actually scoring
Multiple papers converge on the same uncomfortable point: many leaderboard numbers measure something other than what they claim — harness fit, prompt format, observer effects, or even the wrong scoring rule entirely.
AI for scientific discovery
Autonomous systems are now solving open problems and running real instruments — but the most informative results are about which parts of the scientific workflow agents are actually doing, versus which parts remain stubbornly human.
Efficient architectures and serving
Long-context inference, KV-cache pressure, and agent-shaped workloads are being attacked from several directions — sparse attention reframings, KV-cache-free architectures, latent recurrence, and OS-level fixes for sandbox state.
Multi-agent systems and emergent dynamics
Multiple agents are now common in production, and the failure modes are turning out to be coordination problems, semantic collapse, and capability paradoxes — not capacity limits.