AgentCPM-Report: Interleaving Drafting and Deepening for Open-Ended Deep Research

Yishan Li,Wentong Chen,Yukun Yan,Mingwei Li,Sen Mei,Xiaorong Wang,Kunpeng Liu,Xin Cong,Shuo Wang,Zhong Zhang,Yaxi Lu,Zhenghao Liu,Yankai Lin,Zhiyuan Liu,Maosong Sun

Generating deep research reports requires large-scale information acquisition and the synthesis of insight-driven analysis, posing a significant challenge for current language models. Most existing approaches follow a plan-then-write paradigm, whose performance heavily depends on the quality of the initial outline. However, constructing a comprehensive outline itself demands strong reasoning ability, causing current deep research systems to rely almost exclusively on closed-source or online large models. This reliance raises practical barriers to deployment and introduces safety and privacy concerns for user-authored data. In this work, we present AgentCPM-Report, a lightweight yet high-performing local solution composed of a framework that mirrors the human writing process and an 8B-parameter deep research agent. Our framework uses a Writing As Reasoning Policy (WARP), which enables models to dynamically revise outlines during report generation. Under this policy, the agent alternates between Evidence-Based Drafting and Reasoning-Driven Deepening, jointly supporting information acquisition, knowledge refinement, and iterative outline evolution. To effectively equip small models with this capability, we introduce a Multi-Stage Agentic Training strategy, consisting of cold-start, atomic skill RL, and holistic pipeline RL. Experiments on DeepResearch Bench, DeepConsult, and DeepResearch Gym demonstrate that AgentCPM-Report outperforms leading closed-source systems, with substantial gains in Insight.

翻译：生成深度研究报告需要大规模信息获取与洞察驱动的分析综合，这对当前的语言模型构成了重大挑战。大多数现有方法遵循“先规划后撰写”的范式，其性能严重依赖于初始大纲的质量。然而，构建一个全面的大纲本身就需要强大的推理能力，导致当前的深度研究系统几乎完全依赖于闭源或在线大模型。这种依赖性带来了实际部署障碍，并对用户创作数据引入了安全与隐私隐患。在本工作中，我们提出了AgentCPM-Report，一个轻量级但高性能的本地解决方案，它包含一个模拟人类写作过程的框架和一个拥有80亿参数的深度研究智能体。我们的框架采用了一种“写作即推理策略”，使模型能够在报告生成过程中动态修订大纲。在此策略下，智能体交替执行“基于证据的草拟”和“推理驱动的深化”，共同支持信息获取、知识精炼以及大纲的迭代演进。为了有效赋予小模型这种能力，我们引入了一种“多阶段智能体训练”策略，包括冷启动、原子技能强化学习以及整体流程强化学习。在DeepResearch Bench、DeepConsult和DeepResearch Gym上的实验表明，AgentCPM-Report的性能优于领先的闭源系统，在“洞察力”指标上取得了显著提升。