Literature review · 6 episode(s)

Multi-Agent Systems: Coordination, Collapse, and Collective Intelligence

Coordination is the failure surface

The dominant failure mode in multi-agent LLM systems isn't bad reasoning by any agent; it's coordination bugs no human spots on a casual read. Wiring LLM protocol design into TLA+ model checking catches deadlocks before deployment, and verified protocols absorb roughly half the damage of swapping in a cheaper model — verification as an operational buffer, not correctness theater E034. The most pointed recent result indicts the default architecture: routing findings through a central manager both serializes parallel work and corrupts it — a manager paraphrased a correct answer into vagueness — and sharing through a boss makes attempts so correlated that Pass@4 gets worse than not sharing at all. A verified shared whiteboard with citation-anchored notes beats the boss by ten points at half the cost E130. Naive parallelism fails for the same correlation reason: 64 voting agents barely beat one because they sample the same mistakes, whereas a shared evidence graph makes parallel scaling keep paying E051. Document review across partitioned workers shows the structural ceiling — when no agent reads the whole document, cross-section defect detection collapses 74-100% regardless of model capability E087.

Organization is a scaling axis

Held-constant comparisons make the case that structure itself scales. A fixed trainable-parameter budget split across three communicating agents nearly doubles accuracy versus one agent with the whole budget — with the caveat that identical architectures succeed or fail depending on whether they were grown progressively or trained from scratch E060. Freezing the agents and training only a small communication hub lifts per-agent accuracy from 36% to 58% on hard search E083, and even the handoff timing matters: streaming a reasoning chain so downstream agents anchor on the clean head before the rotting tail arrives beats whole-chain transfer E116. The exotic end is market mechanisms: deliberately hobbled agents with virtual money, auctions, and backward payments self-organize into teams that beat an unrestricted soloist, with the price mechanism solving credit assignment for free E107. Recursive delegation pushes the axis into the weights themselves — training a model to hire copies of itself produces phase transitions on hard tasks and lets a 30B model match frontier systems on inputs six times its context window E028.

Diversity collapses unless engineered in

The deflationary result hanging over autonomous multi-agent research pipelines: put three LLMs in open-ended conversation and semantic content barely moves over a thousand rounds — roughly three times more anchored than human threads — and temperature, personas, model mixing, and even diversity-trained RL all fail to break the pattern, with induction-head copying identified as the mechanism E073. The constructive counterpoint is that divergence can be engineered at the organizational layer: a lab-shaped team with shared logs, peer critique, dead-end registries, and replication gates found seven genuine improvements to a training pipeline where a matched-budget lone agent found zero E095, and an open ecosystem of anonymous agents sharing results and failed attempts on a public forum relay-raced a 40-year-old kissing-number record past AlphaEvolve E129. The pattern: collective intelligence comes from infrastructure that preserves and circulates disagreement, not from conversation itself.

Episodes anchoring this topic

When Splitting One Model Across Three Agents Doubles Its Accuracy
The controlled parameter-budget experiment establishing organization itself as a scaling axis.
Why AI Agents Coordinate Better Through a Shared Board Than a Boss
The indictment of manager-centric architectures and the verified shared-context alternative that beats them at half the cost.
When Three LLMs Talk to Each Other, Their Ideas Quietly Stop Moving
The robust negative result that LLM populations don't generate semantic diversity, across twelve intervention families.
Seven Wins to Zero: How Organizing AI Agents Like a Lab Changes the Search
The seven-versus-zero result showing lab-style coordination protocols, not smarter agents, drive long-horizon discovery.
Training the Translator: How a Small Communication Model Lets Agent Teams Outperform Themselves
Showed the communication layer is the right thing to train, with frozen agents and an RL-trained hub.
How a Market of Crippled AI Agents Outscored One Unrestricted Model
Demonstrated market mechanisms solving credit assignment and workflow design without any designed orchestration.