Knowledge from diverse application domains is organized as knowledge graphs (KGs) that are stored in RDF engines accessible in the web via SPARQL endpoints. Expressing a well-formed SPARQL query requires information about the graph structure and the exact URIs of its components, which is impractical for the average user. Question answering (QA) systems assist by translating natural language questions to SPARQL. Existing QA systems are typically based on application-specific human-curated rules, or require prior information, expensive pre-processing and model adaptation for each targeted KG. Therefore, they are hard to generalize to a broad set of applications and KGs. In this paper, we propose KGQAn, a universal QA system that does not need to be tailored to each target KG. Instead of curated rules, KGQAn introduces a novel formalization of question understanding as a text generation problem to convert a question into an intermediate abstract representation via a neural sequence-to-sequence model. We also develop a just-in-time linker that maps at query time the abstract representation to a SPARQL query for a specific KG, using only the publicly accessible APIs and the existing indices of the RDF store, without requiring any pre-processing. Our experiments with several real KGs demonstrate that KGQAn is easily deployed and outperforms by a large margin the state-of-the-art in terms of quality of answers and processing time, especially for arbitrary KGs, unseen during the training.
翻译:来自不同应用领域的知识被组织为知识图谱(KGs),这些图谱存储在可通过SPARQL端点访问的RDF引擎中。编写格式良好的SPARQL查询需要了解图结构及其组件的确切URI,这对普通用户而言不切实际。问答(QA)系统通过将自然语言问题翻译为SPARQL来提供辅助。现有QA系统通常基于特定应用的人工制定规则,或需要针对每个目标知识图谱的先验信息、昂贵的预处理和模型适配。因此,它们难以泛化到广泛的应用场景和知识图谱中。本文提出KGQAn,一种无需针对每个目标知识图谱进行定制的通用QA系统。KGQAn不依赖于人工规则,而是引入一种新颖的问题理解形式化方法,将问题转化为文本生成问题,通过神经序列到序列模型将问题转换为中间抽象表示。我们还开发了一种即时链接器,在查询时利用仅公开可用的API和RDF存储的现有索引(无需任何预处理),将抽象表示映射为特定知识图谱的SPARQL查询。我们在多个真实知识图谱上的实验表明,KGQAn易于部署,并在答案质量和处理时间方面显著优于当前最先进方法,尤其对于训练阶段未见过的任意知识图谱。