Literature review · 6 episode(s)

AI for scientific discovery

← all topics  ·  Glossary →

Mathematics: the clearest signal

Coupling an LLM to the compiler turned a chunk of mathematical AI from 'plausible-looking text' into externally verified results, and a Google system solved nine open Erdős problems including one open since 1996 — with the twist that a 20-line '' (LLM plus compiler plus retry) matched a much more sophisticated evolutionary search E067. A 30B open model trained with a and a two-stage RL progression reached -gold-equivalent on proof writing E048. And on research-level mathematics, decomposing a single into seven coordinated over a shared whiteboard takes the same model from 0% to 8/10 on problems — a strong argument that organisation, not scale, is the contested axis here E076. DeepMind's broader 'co-mathematician' framing pushes against benchmark-as-progress-metric: the more important value may be helping mathematicians fail faster on dead ends and surface ambiguities in old problem statements E029.

Agents running real instruments

A robot system made end-to-end autonomously and caught two deliberately sabotaged experiments, with the architectural pattern that matters being locked-down primitive 'atoms', LLM-composable 'molecules', and freely-designed 'assembly' procedures E072. A separate system ran an optical lab for 21 hours and produced a credible experiment showing that an interferometer can carry pairwise information structurally analogous to — though the framing partly works because a might find Transformer-shaped patterns E002. In numerical scientific computing, giving every method a geometric address in a unit cube and exploiting the conditional-independence structure of method choice lets an one-shot a 1968 NASA re-entry problem and discover a E042. Across all of these, the recurring claim is that the new bottleneck is wet-lab speed and hardware iteration cycles.

The shape of LLM-driven optimisation

A Berkeley unification argues that , , , and are all running the same algorithm, and that '' (error traces, profiler dumps, failed-test diagnostics) is the LLM-era analog of a — producing state-of-the-art for $3 and lifting from 32% to ~90% E065. Agent-driven neural architecture search explores spaces rigid Bayesian or evolutionary methods can't, including an that spontaneously imported from object detection into a GPT training script E053. Verified distributed-systems code can now be synthesised in ten hours instead of nine months, with the proof obligation pushing toward representations that are both easier to verify and faster to run E075. The honest qualifier across all of these: most of the intellectual heavy lifting still lives in the proposer model and the design, not the orchestration layer.

Episodes anchoring this topic