Traditional Retrieval-Augmented Generation (RAG) effectively supports single-hop question answering with large language models but faces significant limitations in multi-hop question answering tasks, which require combining evidence from multiple documents. Existing chunk-based retrieval often provides irrelevant and logically incoherent context, leading to incomplete evidence chains and incorrect reasoning during answer generation. To address these challenges, we propose SentGraph, a sentence-level graph-based RAG framework that explicitly models fine-grained logical relationships between sentences for multi-hop question answering. Specifically, we construct a hierarchical sentence graph offline by first adapting Rhetorical Structure Theory to distinguish nucleus and satellite sentences, and then organizing them into topic-level subgraphs with cross-document entity bridges. During online retrieval, SentGraph performs graph-guided evidence selection and path expansion to retrieve fine-grained sentence-level evidence. Extensive experiments on four multi-hop question answering benchmarks demonstrate the effectiveness of SentGraph, validating the importance of explicitly modeling sentence-level logical dependencies for multi-hop reasoning.
翻译:传统的检索增强生成(RAG)能有效支持大语言模型进行单跳问答,但在需要结合多篇文档证据的多跳问答任务中面临显著局限。现有的基于文本块的检索方法常提供不相关且逻辑不连贯的上下文,导致答案生成过程中证据链不完整与推理错误。为应对这些挑战,我们提出SentGraph——一种基于句子级图结构的RAG框架,该框架显式建模句子间的细粒度逻辑关系以支持多跳问答。具体而言,我们离线构建分层句子图:首先采用修辞结构理论区分核心句与卫星句,继而通过跨文档实体桥梁将其组织为主题级子图。在线检索阶段,SentGraph执行图引导的证据选择与路径扩展,以检索细粒度的句子级证据。在四个多跳问答基准数据集上的大量实验证明了SentGraph的有效性,验证了显式建模句子级逻辑依赖关系对多跳推理的重要性。