Deep research frameworks have shown promising capabilities in synthesizing comprehensive reports from web sources. While deep research possesses significant potential to address complex issues through planning and research cycles, existing frameworks are deficient in sufficient evaluation procedures and stage-specific protections. They typically treat evaluation as exact match accuracy of question-answering, but overlook crucial aspects of report quality such as credibility, coherence, breadth, depth, and safety. This oversight may result in hazardous or malicious sources being integrated into the final report. To address this, we introduce DeepResearchGuard, a framework featuring four-stage safeguards with open-domain evaluation, and DRSafeBench, a novel stage-wise safety benchmark. Evaluating across GPT-4o, o4-mini, Gemini-2.5-flash, DeepSeek-v3, GPT-5, DeepResearchGuard improves defense success rates by 16.53% while reducing over-refusal to 6%. Through extensive experiments, we show that DRSafeBench enables comprehensive open-domain evaluation and stage-aware defenses that effectively block harmful content propagation, while systematically improving report quality without excessive over-refusal rates.
翻译:深度研究框架在从网络资源合成综合性报告方面展现出显著潜力。尽管深度研究通过规划与研究循环具备解决复杂问题的巨大潜力,但现有框架缺乏充分的评估流程和阶段特异性保护措施。它们通常将评估简化为问答的精确匹配准确率,却忽视了报告质量的关键维度,如可信度、连贯性、广度、深度与安全性。这种疏忽可能导致危险或恶意来源被整合进最终报告。为此,我们提出DeepResearchGuard框架,该框架具备四阶段防护机制与开放域评估功能,并构建了新型分阶段安全基准DRSafeBench。通过对GPT-4o、o4-mini、Gemini-2.5-flash、DeepSeek-v3、GPT-5等模型的评估,DeepResearchGuard将防御成功率提升16.53%,同时将过度拒绝率降至6%。大量实验表明,DRSafeBench能够实现全面的开放域评估和阶段感知防御,有效阻断有害内容传播,并在不过度提升拒绝率的前提下系统性地提升报告质量。