Question answering over knowledge graphs and other RDF data has been greatly advanced, with a number of good systems providing crisp answers for natural language questions or telegraphic queries. Some of these systems incorporate textual sources as additional evidence for the answering process, but cannot compute answers that are present in text alone. Conversely, systems from the IR and NLP communities have addressed QA over text, but such systems barely utilize semantic data and knowledge. This paper presents the first system for complex questions that can seamlessly operate over a mixture of RDF datasets and text corpora, or individual sources, in a unified framework. Our method, called UNIQORN, builds a context graph on-the-fly, by retrieving question-relevant evidences from the RDF data and/or a text corpus, using fine-tuned BERT models. The resulting graph is typically rich but highly noisy. UNIQORN copes with this input by a graph algorithm for Group Steiner Trees, that identifies the best answer candidates in the context graph. Experimental results on several benchmarks of complex questions with multiple entities and relations, show that \uniqorn significantly outperforms state-of-the-art methods for QA over heterogeneous sources. The graph-based methodology provides user-interpretable evidence for the complete answering process.
翻译:基于知识图谱及其他RDF数据的问答技术已取得显著进展,众多优秀系统能够为自然语言问题或简略查询提供精准答案。部分系统虽能整合文本语料作为问答过程的辅助证据,但无法独立计算仅存在于文本中的答案。反之,信息检索与自然语言处理领域的系统虽专门处理文本问答,却鲜少利用语义数据与知识。本文提出了首个能够统一处理复杂问题的系统,可在统一框架中无缝操作RDF数据集与文本语料库的混合体(或单一数据源)。我们的方法UNIQORN通过微调BERT模型,从RDF数据和/或文本语料库中检索与问题相关的证据,动态构建上下文图。该图虽信息丰富但噪声极高,UNIQORN采用基于组斯坦纳树的图算法处理此类输入,从上下文图中识别最优候选答案。在多个包含多实体及多关系的复杂问题基准测试上的实验结果表明,UNIQORN在异质数据源问答任务中显著优于现有最优方法。其基于图的完整问答过程可为用户提供可解释的推理证据。