Deep Researcher with Sequential Plan Reflection and Candidates Crossover (Deep Researcher Reflect Evolve)

This paper introduces a novel Deep Researcher architecture designed to generate detailed research reports on complex PhD level topics by addressing the inherent limitations of the Parallel Scaling paradigm. Our system utilizes two key innovations: Sequential Research Plan Refinement via Reflection and a Candidates Crossover algorithm. The sequential refinement process is demonstrated as an efficient method that allows the agent to maintain a centralized Global Research Context, enabling it to look back at current progress, reason about the research plan, and intelligently make changes at runtime. This dynamic adaptation contrasts with parallel approaches, which often suffer from siloed knowledge. The Candidates Crossover algorithm further enhances search efficiency by deploying multiple LLM candidates with varied parameters to explore a larger search space, with their findings synthesized to curate a comprehensive final research response. The process concludes with One Shot Report Generation, ensuring the final document is informed by a unified narrative and high fact density. Powered by the Gemini 2.5 Pro model, our Deep Researcher was evaluated on the DeepResearch Bench, a globally recognized benchmark of 100 doctoral level research tasks. Our architecture achieved an overall score of 46.21, demonstrating superior performance by surpassing leading deep research agents such as Claude Researcher, Nvidia AIQ Research Assistant, Perplexity Research, Kimi Researcher and Grok Deeper Search present on the DeepResearch Bench actively running leaderboard. This performance marginally exceeds our previous work, Static DRA, and reinforces the finding that sequential scaling consistently outperforms the parallel self consistency paradigm.

翻译：本文提出了一种新颖的深度研究员架构，旨在通过解决并行扩展范式固有的局限性，生成关于复杂博士级别主题的详细研究报告。我们的系统采用了两项关键创新：基于反思的序列化研究计划优化与候选方案交叉算法。序列化优化过程被证明是一种高效方法，使智能体能够维持一个集中的全局研究上下文，从而回顾当前进展、推理研究计划，并在运行时智能地做出调整。这种动态适应与并行方法形成对比，后者常受限于知识孤岛问题。候选方案交叉算法通过部署多个具有不同参数的大型语言模型候选者来探索更大的搜索空间，并综合其发现以构建全面的最终研究响应，从而进一步提升搜索效率。该过程以一次性报告生成为终结，确保最终文档具有统一的叙事逻辑和高事实密度。基于Gemini 2.5 Pro模型驱动的深度研究员在DeepResearch Bench（一个包含100项博士级研究任务的全球公认基准测试集）上进行了评估。我们的架构获得了46.21的综合得分，在DeepResearch Bench实时运行排行榜上超越了Claude Researcher、Nvidia AIQ Research Assistant、Perplexity Research、Kimi Researcher及Grok Deeper Search等领先的深度研究智能体，展现出卓越性能。该表现略优于我们先前的工作Static DRA，并进一步证实了序列化扩展范式持续优于并行自洽范式的结论。