Multi-agent systems and emergent dynamics
Coordination as the real bottleneck
Wiring LLM protocol design into TLA+ model checking lets a counterexample trace become an evidence-driven repair signal, and verified protocols absorb roughly half the damage when you swap in a cheaper model — verification as an operational lever, not correctness theatre E034. The harness side shows the same pattern: an agent's stand-down decisions are sticky notes unless they persist as enforced policy E049. And in agent groups, even one weak model can unravel cooperative behaviour for everyone, a failure mode invisible under self-play evaluation E018.
Organisation as a scaling axis
Three frozen agents with a shared parameter budget nearly double single-agent accuracy on physics, with progressive growth turning otherwise-untrainable architectures into trainable ones E060. Coupling two copies of the same model through their hidden states with a 1% bridge moves arithmetic from 36% to 96% and produces an emergent communication protocol from task loss alone E040. Training a backbone to delegate to copies of itself via recursive RL produces a 0%→88% phase transition on hard crafting tasks E028. The deep-research version replaces parallel voting with an evidence-DAG assembled by Searchers and read by a Navigator, getting parallel scaling to keep paying off where majority vote flattens E051. And the RMA seven-agent system over a shared whiteboard takes a 0%-baseline backbone to 80% on First Proof problems E076.
The capability paradox and semantic collapse
Upgrading the Worker model in a manager-auditor system can take attack success from 1-in-5 to 19-in-20, with about three-quarters of the effect mediated by linguistic certainty laundering adversarial requests across the trust boundary E058. Two copies of the same frontier model can persuade each other to produce climate-denial content 100% of the time, suggesting guardrails behave like conversational positions rather than hard limits E045. And on the open-ended side, three LLMs talking for a thousand rounds grow new vocabulary while their semantic content stays anchored — twelve different interventions (temperature, personas, model mixing, removing safety training, RL diversity training) all failed to break the pattern, with induction-head circuits providing a partial mechanistic story E073. The DPI argument in that paper has consequences for any closed-loop autonomous-research pipeline.
Episodes anchoring this topic
- 073-multi-llm-systems-exhibit-robust-semantic-collapse
Demonstrated robust semantic collapse in multi-LLM conversations across twelve interventions.
- 058-the-capability-paradox-how-smarter-auditors-make-multi-agent
Showed that more capable Workers can make multi-agent systems more vulnerable via confidence laundering.
- 060-neuromas-multi-agent-systems-as-neural-networks-with-joint-r
Proved organisation is a scaling axis distinct from model size at fixed parameter budget.
- 040-the-bicameral-model-bidirectional-hidden-state-coupling-betw
Coupled two frozen models through hidden states with an emergent task-driven protocol.
- 034-tracefix-repairing-agent-coordination-protocols-with-tla-cou
Wired model checking into protocol repair as an operational lever for cheaper models.
- 045-llm-based-persuasion-enables-guardrail-override-in-frontier-
Showed same-model persuasion can collapse single-turn safety in five turns.