We present MiroThinker v1.0, an open-source research agent designed to advance tool-augmented reasoning and information-seeking capabilities. Unlike previous agents that only scale up model size or context length, MiroThinker explores interaction scaling at the model level, systematically training the model to handle deeper and more frequent agent-environment interactions as a third dimension of performance improvement. Unlike LLM test-time scaling, which operates in isolation and risks degradation with longer reasoning chains, interactive scaling leverages environment feedback and external information acquisition to correct errors and refine trajectories. Through reinforcement learning, the model achieves efficient interaction scaling: with a 256K context window, it can perform up to 600 tool calls per task, enabling sustained multi-turn reasoning and complex real-world research workflows. Across four representative benchmarks-GAIA, HLE, BrowseComp, and BrowseComp-ZH-the 72B variant achieves up to 81.9%, 37.7%, 47.1%, and 55.6% accuracy respectively, surpassing previous open-source agents and approaching commercial counterparts such as GPT-5-high. Our analysis reveals that MiroThinker benefits from interactive scaling consistently: research performance improves predictably as the model engages in deeper and more frequent agent-environment interactions, demonstrating that interaction depth exhibits scaling behaviors analogous to model size and context length. These findings establish interaction scaling as a third critical dimension for building next-generation open research agents, complementing model capacity and context windows.
翻译:我们提出MiroThinker v1.0,这是一种旨在推进工具增强推理与信息获取能力的开源研究Agent。与先前仅扩展模型规模或上下文长度的Agent不同,MiroThinker在模型层面探索交互扩展,系统性地训练模型处理更深层、更频繁的Agent-环境交互,作为性能提升的第三维度。不同于LLM测试时扩展(其孤立运行且易随推理链增长而退化),交互扩展利用环境反馈与外部信息获取来纠正错误并优化轨迹。通过强化学习,该模型实现了高效的交互扩展:在256K上下文窗口下,每个任务可执行多达600次工具调用,从而支持持续的多轮推理与复杂的真实世界研究工作流程。在四个代表性基准测试——GAIA、HLE、BrowseComp和BrowseComp-ZH上,该72B变体分别达到81.9%、37.7%、47.1%和55.6%的准确率,超越了先前的开源Agent,并接近GPT-5-high等商业竞争对手。我们的分析表明,MiroThinker持续受益于交互扩展:随着模型参与更深层、更频繁的Agent-环境交互,研究性能可预测地提升,证明交互深度展现出与模型规模和上下文长度相似的扩展行为。这些发现确立了交互扩展作为构建下一代开放研究Agent的第三关键维度,与模型能力及上下文窗口形成互补。