Multi-agent reasoning systems adopt a "generate-then-transfer" paradigm that forces end-to-end latency to scale linearly with pipeline depth. We introduce StreamMA, a multi-agent reasoning system that streams each reasoning step to downstream agents as soon as it is generated, pipelining adjacent agents and thus reducing latency. Surprisingly, this pipelining also improves effectiveness: because multi-step reasoning quality is non-uniform and early steps are more reliable than later ones, working with these reliable early steps instead of the full chain prevents error-prone late steps from misleading downstream agents. We formalize both advantages with the first closed-form joint analysis of stream, serial, and single protocols, deriving the effectiveness ordering, speedup upper bound, and cost ratio. Across eight reasoning benchmarks spanning mathematics, science, and code, two frontier LLMs (Claude Opus 4.6 and GPT-5.4), and three topologies (Chain, Tree, Graph), StreamMA outperforms both baselines (avg. +7.3 pp, max +22.4 pp on HMMT 2026; Claude Opus 4.6-high). Beyond these contributions, we discover a "step-level scaling law": increasing per-agent steps consistently improves both effectiveness and efficiency, a new scaling dimension orthogonal to and composable with agent-count scaling.
翻译:多智能体推理系统采用“先生成再传输”的范式,导致端到端延迟随流水线深度线性增长。我们提出StreamMA,一种多智能体推理系统,它将每个推理步骤在生成后立即流式传输给下游智能体,通过流水线化相邻智能体从而降低延迟。令人惊讶的是,这种流水线化还提升了有效性:由于多步推理质量并非均匀分布,早期步骤比后期步骤更可靠,因此使用这些可靠的早期步骤而非完整链,可以防止易出错的后期步骤误导下游智能体。我们通过首个流式、串行和单次协议的闭式联合分析形式化了这两种优势,推导出有效性排序、加速上限和成本比率。在涵盖数学、科学和代码的八个推理基准测试中,结合两种前沿大语言模型(Claude Opus 4.6和GPT-5.4)以及三种拓扑结构(链式、树状、图状),StreamMA均优于两种基线方法(在HMMT 2026、Claude Opus 4.6-high上平均提升+7.3个百分点,最大提升+22.4个百分点)。除上述贡献外,我们还发现了一种“步骤级缩放定律”:增加每个智能体的步骤数能够持续提升有效性和效率,这是一个与智能体数量缩放正交且可组合的新缩放维度。