UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question Answering Over Knowledge Graph

Multi-hop Question Answering over Knowledge Graph~(KGQA) aims to find the answer entities that are multiple hops away from the topic entities mentioned in a natural language question on a large-scale Knowledge Graph (KG). To cope with the vast search space, existing work usually adopts a two-stage approach: it first retrieves a relatively small subgraph related to the question and then performs the reasoning on the subgraph to find the answer entities accurately. Although these two stages are highly related, previous work employs very different technical solutions for developing the retrieval and reasoning models, neglecting their relatedness in task essence. In this paper, we propose UniKGQA, a novel approach for multi-hop KGQA task, by unifying retrieval and reasoning in both model architecture and parameter learning. For model architecture, UniKGQA consists of a semantic matching module based on a pre-trained language model~(PLM) for question-relation semantic matching, and a matching information propagation module to propagate the matching information along the directed edges on KGs. For parameter learning, we design a shared pre-training task based on question-relation matching for both retrieval and reasoning models, and then propose retrieval- and reasoning-oriented fine-tuning strategies. Compared with previous studies, our approach is more unified, tightly relating the retrieval and reasoning stages. Extensive experiments on three benchmark datasets have demonstrated the effectiveness of our method on the multi-hop KGQA task. Our codes and data are publicly available at~\url{https://github.com/RUCAIBox/UniKGQA}.

翻译：多跳问答旨在从大规模知识图谱中，定位与自然语言问题中提及的主题实体相隔多个跳步的答案实体。为应对庞大的搜索空间，现有工作通常采用两阶段方法：先检索与问题相关的较小子图，再在该子图上进行推理以精确找出答案实体。尽管这两个阶段高度相关，但以往工作对检索模型与推理模型采用了截然不同的技术方案，忽略了它们在任务本质上的关联性。本文提出UniKGQA——一种面向多跳知识图谱问答任务的新方法，通过在模型架构与参数学习两个层面统一检索与推理。在模型架构上，UniKGQA包含基于预训练语言模型的语义匹配模块，用于实现问题与关系的语义匹配，以及匹配信息传播模块，用于沿知识图谱的有向边传播匹配信息。在参数学习上，我们为检索与推理模型设计了基于问题-关系匹配的共享预训练任务，并提出了面向检索与推理的微调策略。与以往研究相比，我们的方法更具统一性，紧密关联了检索与推理阶段。在三个基准数据集上的大量实验验证了该方法在多跳知识图谱问答任务上的有效性。我们的代码与数据已公开于：\url{https://github.com/RUCAIBox/UniKGQA}。