Recent progress in retrieval-augmented generation (RAG) has led to more accurate and interpretable multi-hop question answering (QA). Yet, challenges persist in integrating iterative reasoning steps with external knowledge retrieval. To address this, we introduce StepChain GraphRAG, a framework that unites question decomposition with a Breadth-First Search (BFS) Reasoning Flow for enhanced multi-hop QA. Our approach first builds a global index over the corpus; at inference time, only retrieved passages are parsed on-the-fly into a knowledge graph, and the complex query is split into sub-questions. For each sub-question, a BFS-based traversal dynamically expands along relevant edges, assembling explicit evidence chains without overwhelming the language model with superfluous context. Experiments on MuSiQue, 2WikiMultiHopQA, and HotpotQA show that StepChain GraphRAG achieves state-of-the-art Exact Match and F1 scores. StepChain GraphRAG lifts average EM by 2.57% and F1 by 2.13% over the SOTA method, achieving the largest gain on HotpotQA (+4.70% EM, +3.44% F1). StepChain GraphRAG also fosters enhanced explainability by preserving the chain-of-thought across intermediate retrieval steps. We conclude by discussing how future work can mitigate the computational overhead and address potential hallucinations from large language models to refine efficiency and reliability in multi-hop QA.
翻译:检索增强生成(RAG)的最新进展推动了更准确、可解释的多跳问答(QA)系统的发展。然而,在迭代推理步骤与外部知识检索的整合方面仍存在挑战。为此,我们提出了StepChain GraphRAG框架,该框架将问题分解与广度优先搜索(BFS)推理流相结合,以增强多跳问答能力。我们的方法首先在语料库上构建全局索引;在推理时,仅将检索到的段落实时解析为知识图谱,并将复杂查询拆分为子问题。针对每个子问题,基于BFS的遍历会沿相关边动态扩展,从而组装出明确的证据链,同时避免语言模型被冗余上下文淹没。在MuSiQue、2WikiMultiHopQA和HotpotQA数据集上的实验表明,StepChain GraphRAG在精确匹配(EM)和F1分数上均达到了最先进的水平。相较于当前最优方法,StepChain GraphRAG将平均EM提升了2.57%,F1提升了2.13%,其中在HotpotQA上取得了最大增益(EM +4.70%,F1 +3.44%)。此外,StepChain GraphRAG通过保留中间检索步骤的思维链,增强了系统的可解释性。最后,我们讨论了未来工作如何通过减轻计算开销、应对大语言模型可能产生的幻觉问题,以进一步提升多跳问答的效率和可靠性。