Modern scientific research relies on large-scale data, complex workflows, and specialized tools, which existing LLMs and tool-based agents struggle to handle due to limitations in long-horizon planning, robust goal maintenance, and continual learning from execution. To address these issues, in this work, we propose S1-NexusAgent, a self-evolving agent framework designed for multidisciplinary scientific research. S1-NexusAgent adopts a hierarchical Plan-and-CodeAct execution paradigm, decoupling global scientific planning from subtask-level tool execution through a dual-loop architecture, thereby enabling stable modeling of complex research workflows. The system natively supports the Model Context Protocol (MCP), integrates up to thousands of cross-disciplinary scientific tools, and achieves efficient orchestration of heterogeneous research tools via intention-aware dynamic tool retrieval and hot-plug mechanisms. To address long-context and large-scale data challenges in scientific settings, S1-NexusAgent introduces object-reference-based sparse context management, which enables sub-task context isolation and intermediate result compression. Building on this, a Critic Agent automatically evaluates complete execution trajectories and distills high-quality research paths into reusable Scientific Skills, forming a closed loop for continuous self-evolution, which is valuable for sustainable and long-horizon scientific research. Experiments on authoritative scientific benchmarks involving long-horizon planning and complex specialized tool orchestration, including biomini-eval (biology), ChemBench (chemistry), and MatSciBench (material science), demonstrate that S1-NexusAgent achieves state-of-the-art performance, validating its effectiveness and generalization capability in complex scientific tasks.
翻译:现代科学研究依赖于大规模数据、复杂工作流和专用工具,而现有的大语言模型和基于工具的智能体由于在长程规划、鲁棒目标维持以及从执行中持续学习等方面存在局限,难以有效处理这些挑战。为解决这些问题,本文提出S1-NexusAgent,一种专为多学科科学研究设计的自进化智能体框架。S1-NexusAgent采用分层的“规划-代码执行”范式,通过双循环架构将全局科学规划与子任务级工具执行解耦,从而实现对复杂研究工作流的稳定建模。该系统原生支持模型上下文协议,集成了多达数千个跨学科科学工具,并通过意图感知的动态工具检索与热插拔机制,实现了异构研究工具的高效编排。为应对科学场景中的长上下文和大规模数据挑战,S1-NexusAgent引入了基于对象引用的稀疏上下文管理技术,实现了子任务上下文隔离与中间结果压缩。在此基础上,一个批评者智能体自动评估完整执行轨迹,并将高质量研究路径提炼为可复用的“科学技能”,形成一个持续自我进化的闭环,这对于可持续、长周期的科学研究具有重要价值。在涉及长程规划和复杂专业工具编排的权威科学基准测试(包括biomini-eval(生物学)、ChemBench(化学)和MatSciBench(材料科学))上的实验表明,S1-NexusAgent取得了最先进的性能,验证了其在复杂科学任务中的有效性和泛化能力。