Knowledge from diverse application domains is organized as knowledge graphs (KGs) that are stored in RDF engines accessible in the web via SPARQL endpoints. Expressing a well-formed SPARQL query requires information about the graph structure and the exact URIs of its components, which is impractical for the average user. Question answering (QA) systems assist by translating natural language questions to SPARQL. Existing QA systems are typically based on application-specific human-curated rules, or require prior information, expensive pre-processing and model adaptation for each targeted KG. Therefore, they are hard to generalize to a broad set of applications and KGs. In this paper, we propose KGQAn, a universal QA system that does not need to be tailored to each target KG. Instead of curated rules, KGQAn introduces a novel formalization of question understanding as a text generation problem to convert a question into an intermediate abstract representation via a neural sequence-to-sequence model. We also develop a just-in-time linker that maps at query time the abstract representation to a SPARQL query for a specific KG, using only the publicly accessible APIs and the existing indices of the RDF store, without requiring any pre-processing. Our experiments with several real KGs demonstrate that KGQAn is easily deployed and outperforms by a large margin the state-of-the-art in terms of quality of answers and processing time, especially for arbitrary KGs, unseen during the training.
翻译:不同应用领域的知识被组织为知识图谱(KGs),这些图谱存储在可通过SPARQL端点从Web访问的RDF引擎中。编写一个规范的SPARQL查询需要了解图结构及其组件的精确URI,这对普通用户而言并不现实。问答系统通过将自然语言问题转换为SPARQL来辅助用户。现有的问答系统通常基于特定领域的人工规则,或需要目标知识图谱的预信息、昂贵的预处理和模型适配,因此难以推广到广泛的应用场景和知识图谱中。本文提出KGQAn——一个无需针对每个目标知识图谱进行定制的通用问答系统。不同于人工规则,KGQAn引入了一种新颖的问题理解形式化方法,将问题转化为文本生成问题,通过神经序列到序列模型将问题转换为中间抽象表示。我们还开发了一个即时链接器,在查询时仅利用RDF存储的公开API和现有索引,无需任何预处理,即可将抽象表示映射到特定知识图谱的SPARQL查询。在多个真实知识图谱上的实验表明,KGQAn易于部署,且在答案质量和处理时间上大幅超越现有技术水平,尤其适用于训练阶段未见过的任意知识图谱。