DeepEra：一种用于科学检索增强生成式问答的深度证据重排序智能体 (DeepEra: A Deep Evidence Reranking Agent for Scientific Retrieval-Augmented Generated Question Answering)

With the rapid growth of scientific literature, scientific question answering (SciQA) has become increasingly critical for exploring and utilizing scientific knowledge. Retrieval-Augmented Generation (RAG) enhances LLMs by incorporating knowledge from external sources, thereby providing credible evidence for scientific question answering. But existing retrieval and reranking methods remain vulnerable to passages that are semantically similar but logically irrelevant, often reducing factual reliability and amplifying hallucinations.To address this challenge, we propose a Deep Evidence Reranking Agent (DeepEra) that integrates step-by-step reasoning, enabling more precise evaluation of candidate passages beyond surface-level semantics. To support systematic evaluation, we construct SciRAG-SSLI (Scientific RAG - Semantically Similar but Logically Irrelevant), a large-scale dataset comprising about 300K SciQA instances across 10 subjects, constructed from 10M scientific corpus. The dataset combines naturally retrieved contexts with systematically generated distractors to test logical robustness and factual grounding. Comprehensive evaluations confirm that our approach achieves superior retrieval performance compared to leading rerankers. To our knowledge, this work is the first to comprehensively study and empirically validate innegligible SSLI issues in two-stage RAG frameworks.

翻译：随着科学文献的快速增长，科学问答（SciQA）对于探索和利用科学知识变得日益重要。检索增强生成（RAG）通过整合外部知识源来增强大语言模型，从而为科学问答提供可信的证据。然而，现有的检索和重排序方法在面对语义相似但逻辑无关的段落时仍然脆弱，这通常会降低事实可靠性并加剧幻觉问题。为应对这一挑战，我们提出了一种深度证据重排序智能体（DeepEra），它集成了逐步推理能力，能够超越表层语义对候选段落进行更精确的评估。为了支持系统性评估，我们构建了SciRAG-SSLI（科学RAG - 语义相似但逻辑无关）数据集，这是一个包含约30万个SciQA实例的大规模数据集，涵盖10个学科，基于1000万篇科学文献构建而成。该数据集结合了自然检索的上下文和系统生成的干扰项，以测试逻辑鲁棒性和事实基础。全面的评估证实，与领先的重排序方法相比，我们的方法实现了更优的检索性能。据我们所知，本研究首次全面研究并实证验证了两阶段RAG框架中不可忽视的SSLI问题。

相关内容

排序

关注 313

排序是计算机内经常进行的一种操作，其目的是将一组“无序”的记录序列调整为“有序”的记录序列。分内部排序和外部排序。若整个排序过程不需要访问外存便能完成，则称此类排序问题为内部排序。反之，若参加排序的记录数量很大，整个序列的排序过程不可能在内存中完成，则称此类排序问题为外部排序。内部排序的过程是一个逐步扩大记录的有序序列长度的过程。

Deep Research（深度研究）：系统性综述

专知会员服务

50+阅读 · 2025年12月3日

【AAAI2026】TruthfulRAG：基于知识图谱解决检索增强生成中的事实层冲突

专知会员服务

22+阅读 · 2025年11月15日

检索增强生成（RAG）技术，261页slides

专知会员服务

42+阅读 · 2025年10月16日

迈向可信的检索增强生成：大语言模型综述

专知会员服务

30+阅读 · 2025年2月12日