Recent advancements in large language models (LLMs) have led to significant improvements in various natural language processing tasks, but it is still challenging for LLMs to perform knowledge-intensive complex question answering due to LLMs' inefficacy in reasoning planning and the hallucination problem. A typical solution is to employ retrieval-augmented generation (RAG) coupled with chain-of-thought (CoT) reasoning, which decomposes complex questions into chain-like sub-questions and applies iterative RAG at each sub-question. However, prior works exhibit sub-optimal reasoning planning and overlook dynamic knowledge retrieval from heterogeneous sources. In this paper, we propose AtomR, a novel heterogeneous knowledge reasoning framework that conducts multi-source reasoning at the atomic level. Drawing inspiration from the graph modeling of knowledge, AtomR leverages large language models (LLMs) to decompose complex questions into combinations of three atomic knowledge operators, significantly enhancing the reasoning process at both the planning and execution stages. We also introduce BlendQA, a novel evaluation benchmark tailored to assess complex heterogeneous knowledge reasoning. Experiments show that AtomR significantly outperforms state-of-the-art baselines across three single-source and two multi-source reasoning benchmarks, with notable performance gains of 9.4% on 2WikiMultihop and 9.5% on BlendQA.
翻译:近年来,大语言模型(LLMs)在各项自然语言处理任务中取得了显著进展,但由于LLMs在推理规划方面的低效以及幻觉问题,其在知识密集型复杂问答任务上仍面临挑战。典型的解决方案是采用检索增强生成(RAG)结合思维链(CoT)推理,即将复杂问题分解为链式子问题,并在每个子问题上进行迭代式RAG。然而,现有方法存在推理规划次优、且忽视了从异构源进行动态知识检索的问题。本文提出AtomR,一种新颖的异构知识推理框架,可在原子层面进行多源推理。受知识图谱建模的启发,AtomR利用大语言模型(LLMs)将复杂问题分解为三种原子知识操作符的组合,从而在规划和执行两个阶段显著增强推理过程。我们还引入了BlendQA,一个专为评估复杂异构知识推理而设计的新评估基准。实验表明,AtomR在三个单源和两个多源推理基准上均显著优于现有最先进的基线方法,其中在2WikiMultihop上取得了9.4%的性能提升,在BlendQA上取得了9.5%的性能提升。