We introduce Parallel Coordinated Reasoning (PaCoRe), a training-and-inference framework designed to overcome a central limitation of contemporary language models: their inability to scale test-time compute (TTC) far beyond sequential reasoning under a fixed context window. PaCoRe departs from the traditional sequential paradigm by driving TTC through massive parallel exploration coordinated via a message-passing architecture in multiple rounds. Each round launches many parallel reasoning trajectories, compacts their findings into context-bounded messages, and synthesizes these messages to guide the next round and ultimately produce the final answer. Trained end-to-end with large-scale, outcome-based reinforcement learning, the model masters the synthesis abilities required by PaCoRe and scales to multi-million-token effective TTC without exceeding context limits. The approach yields strong improvements across diverse domains, and notably pushes reasoning beyond frontier systems in mathematics: an 8B model reaches 94.5% on HMMT 2025, surpassing GPT-5's 93.2% by scaling effective TTC to roughly two million tokens. We open-source model checkpoints, training data, and the full inference pipeline to accelerate follow-up work.
翻译:本文提出并行协同推理(PaCoRe)——一种训练与推理框架,旨在克服当前语言模型的核心局限:无法将测试时计算(TTC)规模扩展至远超固定上下文窗口下的序列推理能力。PaCoRe突破传统序列范式,通过多轮消息传递架构协调大规模并行探索来驱动TTC。每轮推理同时启动大量并行推理轨迹,将其发现压缩为上下文受限的消息,并综合这些消息以指导下一轮推理,最终生成答案。该模型通过基于结果的大规模端到端强化学习进行训练,掌握了PaCoRe所需的信息综合能力,能够将有效TTC扩展至数百万token量级而不突破上下文限制。该方法在多个领域均取得显著性能提升,尤其在数学推理方面超越前沿系统:一个80亿参数模型在HMMT 2025测试中达到94.5%准确率,通过将有效TTC扩展至约两百万token,超越了GPT-5的93.2%表现。我们开源了模型检查点、训练数据及完整推理流程,以加速后续研究。