The increasing adoption of Large Language Models (LLMs) has enabled AI scientists to perform complex end-to-end scientific discovery tasks requiring coordination of specialized roles, including idea generation and experimental execution. However, most state-of-the-art AI scientist systems rely on static, hand-designed pipelines and fail to adapt based on accumulated interaction histories. As a result, these systems overlook promising research directions, repeat failed experiments, and pursue infeasible ideas. To address this, we introduce EvoScientist, an evolving multi-agent AI scientist framework that continuously improves research strategies through persistent memory and self-evolution. EvoScientist comprises three specialized agents: a Researcher Agent (RA) for scientific idea generation, an Engineer Agent (EA) for experiment implementation and execution, and an Evolution Manager Agent (EMA) that distills insights from prior interactions into reusable knowledge. EvoScientist contains two persistent memory modules: (i) an ideation memory, which summarizes feasible research directions from top-ranked ideas while recording previously unsuccessful directions; and (ii) an experimentation memory, which captures effective data processing and model training strategies derived from code search trajectories and best-performing implementations. These modules enable the RA and EA to retrieve relevant prior strategies, improving idea quality and code execution success rates over time. Experiments show that EvoScientist outperforms 7 open-source and commercial state-of-the-art systems in scientific idea generation, achieving higher novelty, feasibility, relevance, and clarity via automatic and human evaluation. EvoScientist also substantially improves code execution success rates through multi-agent evolution, demonstrating persistent memory's effectiveness for end-to-end scientific discovery.
翻译:大型语言模型(LLM)的日益普及使得AI科学家能够执行需要协调多个专业角色(包括想法生成与实验执行)的复杂端到端科学发现任务。然而,大多数最先进的AI科学家系统依赖于静态、人工设计的流程,无法基于累积的交互历史进行自适应调整。因此,这些系统常忽略有前景的研究方向、重复失败的实验并追求不可行的想法。为解决此问题,我们提出了EvoScientist——一个通过持久性记忆与自我演化持续改进研究策略的演化型多智能体AI科学家框架。EvoScientist包含三个专业智能体:负责科学想法生成的研究员智能体(RA)、负责实验实现与执行的工程师智能体(EA),以及从历史交互中提炼可复用知识的演化管理智能体(EMA)。EvoScientist配备两个持久性记忆模块:(i)构思记忆,通过总结高排名想法中的可行研究方向并记录先前失败方向;(ii)实验记忆,通过代码搜索轨迹和最佳实现方案捕获有效的数据处理与模型训练策略。这些模块使RA和EA能够检索相关的历史策略,从而持续提升想法质量与代码执行成功率。实验表明,在科学想法生成任务中,EvoScientist在自动与人工评估指标上均优于7个开源及商业最先进系统,实现了更高新颖性、可行性、相关性与清晰度。通过多智能体演化机制,EvoScientist还显著提升了代码执行成功率,验证了持久性记忆对端到端科学发现的有效性。