Literature review · 6 episode(s)

Multi-Agent Systems: Coordination, Collapse, and Collective Intelligence

← all topics  ·  Glossary →

Coordination is the failure surface

The dominant failure mode in multi- LLM systems isn't bad reasoning by any agent; it's coordination bugs no human spots on a casual read. Wiring LLM protocol design into TLA+ model checking catches before deployment, and verified protocols absorb roughly half the damage of swapping in a cheaper model — verification as an operational buffer, not correctness theater E034. The most pointed recent result indicts the default architecture: routing findings through a central manager both serializes parallel work and corrupts it — a manager paraphrased a correct answer into vagueness — and sharing through a boss makes attempts so correlated that Pass@4 gets worse than not sharing at all. A verified shared whiteboard with citation-anchored notes beats the boss by ten points at half the cost E130. Naive parallelism fails for the same correlation reason: 64 voting agents barely beat one because they sample the same mistakes, whereas a shared makes parallel scaling keep paying E051. Document review across partitioned workers shows the structural ceiling — when no agent reads the whole document, collapses 74-100% regardless of model E087.

Organization is a scaling axis

Held-constant comparisons make the case that structure itself scales. A fixed trainable-parameter budget split across three communicating nearly doubles accuracy versus one agent with the whole budget — with the caveat that identical architectures succeed or fail depending on whether they were grown progressively or trained from scratch E060. Freezing the agents and training only a small communication lifts per-agent accuracy from 36% to 58% on hard search E083, and even the handoff timing matters: streaming a reasoning chain so downstream agents anchor on the clean head before the rotting tail arrives beats whole-chain transfer E116. The exotic end is market mechanisms: deliberately hobbled agents with virtual money, auctions, and backward payments self-organize into teams that beat an unrestricted soloist, with the price mechanism solving for free E107. Recursive delegation pushes the axis into the themselves — training a model to hire copies of itself produces on hard tasks and lets a 30B model match frontier systems on inputs six times its E028.

Diversity collapses unless engineered in

The deflationary result hanging over autonomous multi- research pipelines: put three LLMs in open-ended conversation and semantic content barely moves over a thousand rounds — roughly three times more anchored than human threads — and , personas, model mixing, and even diversity-trained all fail to break the pattern, with induction-head copying identified as the mechanism E073. The constructive counterpoint is that divergence can be engineered at the organizational layer: a lab-shaped team with shared logs, peer critique, dead-end registries, and replication gates found seven genuine improvements to a training where a matched-budget lone agent found zero E095, and an open ecosystem of anonymous agents sharing results and failed attempts on a public forum relay-raced a 40-year-old kissing-number record past E129. The pattern: collective intelligence comes from infrastructure that preserves and circulates disagreement, not from conversation itself.

Episodes anchoring this topic