Question answering over knowledge graphs and other RDF data has been greatly advanced, with a number of good techniques providing crisp answers for natural language questions or telegraphic queries. Some of these systems incorporate textual sources as additional evidence for the answering process, but cannot compute answers that are present in text alone. Conversely, techniques from the IR and NLP communities have addressed QA over text, but such systems barely utilize semantic data and knowledge. This paper presents a method for complex questions that can seamlessly operate over a mixture of RDF datasets and text corpora, or individual sources, in a unified framework. Our method, called UNIQORN, builds a context graph on-the-fly, by retrieving question-relevant evidences from the RDF data and/or a text corpus, using fine-tuned BERT models. The resulting graph typically contains all question-relevant evidences but also a lot of noise. UNIQORN copes with this input by a graph algorithm for Group Steiner Trees, that identifies the best answer candidates in the context graph. Experimental results on several benchmarks of complex questions with multiple entities and relations, show that UNIQORN significantly outperforms state-of-the-art methods for heterogeneous QA. The graph-based methodology provides user-interpretable evidence for the complete answering process.
翻译:知识图谱及其他RDF数据上的问答技术已取得显著进展,多种优秀方法能为自然语言问题或片段式查询提供精准答案。部分系统将文本语料作为辅助证据纳入问答流程,但无法计算仅存在于文本中的答案。反之,信息检索与自然语言处理领域的技术虽已实现文本问答,却几乎未利用语义数据与知识。本文提出一种面向复杂问题的方法,可在统一框架中无缝衔接RDF数据集、文本语料库或单一数据源。该方法名为UNIQORN,通过微调BERT模型从RDF数据和/或文本语料中检索与问题相关的证据,动态构建上下文图。生成的图通常包含所有问题相关证据,但亦存在大量噪声。UNIQORN采用基于群组斯坦纳树(Group Steiner Trees)的图算法处理此类输入,在上下文图中识别最佳答案候选。在多个包含多实体与多关系的复杂问题基准测试中,实验结果表明,UNIQORN显著优于异构问答领域的当前最优方法。基于图的推理机制为完整问答流程提供了可用户解释的证据。