Concept · 19 episode(s)

Parallel Sampling

Definition

Parallel sampling generates many candidate responses from a model at once and then picks among them — by vote, by verifier, by reward model. It’s a simple way to trade inference compute for quality and the underlying mechanism of pass@k and self-consistency.

Episodes covering this

193
Freeze Most of the Network: Where RL Improvement Actually Lives in a Transformer
Is One Layer Enough? Training A Single Transformer Layer Can Match Full-Parameter RL Training
Zhang, Hu, Glentis et al. · University of Minnesota·22 min·Jul 02, 2026
191
How One Researcher Beat GPT-5.2 and Gemini 3 by Judging Their Answers, Not Improving Them
Modality-Driven Search with Holistic Trace Judging for ARC-AGI-2
Land · Independent Researcher·26 min·Jul 02, 2026
188
A Coding Agent Found a Hole in a Peer-Reviewed STOC Proof for Five Dollars
Beyond the Library: An Agentic Framework for Autoformalizing Research Mathematics
Moakhar, Gholami, Springer et al. · University of Maryland·20 min·Jul 02, 2026
179
How DeepSeek Made One User Faster Without Slowing Down the Crowd
DSpark: Confidence-Scheduled Speculative Decoding with
XinCheng, XingkaiYu, ChenzeShao et al. · Peking University / DeepSeek-AI·23 min·Jun 27, 2026
133
How MiniMax Turned a Reward-Hacking Disaster Into Olympiad Gold
MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling
Chen, Zhang, Zhang et al. · MiniMax / The Chinese University of Hong Kong·34 min·Jun 12, 2026
131
Why Autonomous Research Agents Forget Their Own Lessons, and Arbor's Fix
Toward Generalist Autonomous Research via Hypothesis-Tree Refinement
Jin, Hu, Qiu et al. · Renmin University of China·33 min·Jun 11, 2026
130
Why AI Agents Coordinate Better Through a Shared Board Than a Boss
Decentralized Multi-Agent Systems with Shared Context
Mao, Mirhoseini · Stanford University·34 min·Jun 11, 2026
120
How an AI Agent Rewrites Its Own Tools, Without an Answer Key
Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts
Pan, Liu, Lin et al. · City University of Hong Kong·30 min·Jun 05, 2026
119
Beating Reinforcement Learning Without Ever Touching the Model's Weights
Agentic Monte Carlo: Simulating Reinforcement Learning for Black-Box Agents
Hwang, Suri, Villecroze et al. · Layer6 AI·22 min·Jun 05, 2026
117
How an Open AI System Verified 672 Hard Math Proofs for Under $300
Goedel-Architect: Streamlining Formal Theorem Proving with Blueprint Generation and Refinement
Chung, Cai, Li et al. · Princeton University·26 min·Jun 05, 2026
115
Teaching a Phone Agent to Reason Silently, And Keeping It Honest
MIRAGE: Mobile Agents with Implicit Reasoning and Generative World Models
Yang, Hu, Hao et al. · Beihang University·24 min·Jun 04, 2026
110
How an Agent Got 44 Points Better by Mining Its Own Scratch Paper
Inducing Reasoning Primitives from Agent Traces
Lei, Yan, Momo et al. · Carnegie Mellon University·27 min·Jun 03, 2026
101
Treating Math Formalization Like a Codebase, and Where the Agents Cheat
Formalizing Mathematics at Scale
Rammal, Patel, Gloeckle et al. · FAIR at Meta / CERMICS·27 min·May 29, 2026
083
Training the Translator: How a Small Communication Model Lets Agent Teams Outperform Themselves
AgentFugue: Agent Scaling for Long-Horizon Tasks through Collective Reasoning
Hu, Qian, Wang et al. · GSAI·24 min·May 26, 2026
076
Same Model, Organized Differently: How an Agent Architecture Beat Frontier Systems at Research Math
RMA: an Agentic System for Research-Level Mathematical Problems
Zhao, Yuan, Choi et al. · Georgia Institute of Technology·22 min·May 25, 2026
067
An AI Just Solved a 1996 Erdős Problem—and the Simplest Agent Won
Advancing Mathematics Research with AI-Driven Formal Proof Search
Tsoukalas, Kovsharov, Shirobokov et al. · Google DeepMind·31 min·May 22, 2026
063
Why Web Agents Are Slow: A Compiler-Style Fix for Computer-Use Latency
Agent JIT Compilation for Latency-Optimizing Web Agent Planning and Scheduling
Winston, Wang, Mirhoseini et al. · Stanford University·26 min·May 21, 2026
051
Why Parallel Sampling Plateaus, And What Evidence Graphs Do Instead
Argus: Evidence Assembly for Scalable Deep Research Agents
Zhang, Su, Chen et al. · MiroMind AI·22 min·May 18, 2026
003
How to Pick the Best of Sixteen Coding Agent Rollouts
Scaling Test-Time Compute for Agentic Coding
Kim, Yang, Niu et al. · Meta Superintelligence Labs / University of Washington·17 min·May 01, 2026

Worth reading next

Papers we haven't done a deep dive on yet, but would recommend on this topic.