Existing KBQA methods have traditionally relied on multi-stage methodologies, involving tasks such as entity linking, subgraph retrieval and query structure generation. However, multi-stage approaches are dependent on the accuracy of preceding steps, leading to cascading errors and increased inference time. Although a few studies have explored the use of end-to-end models, they often suffer from lower accuracy and generate inoperative query that is not supported by the underlying data. Furthermore, most prior approaches are limited to the static training data, potentially overlooking the evolving nature of knowledge bases over time. To address these challenges, we present a novel end-to-end natural language to SPARQL framework, SPARKLE. Notably SPARKLE leverages the structure of knowledge base directly during the decoding, effectively integrating knowledge into the query generation. Our study reveals that simply referencing knowledge base during inference significantly reduces the occurrence of inexecutable query generations. SPARKLE achieves new state-of-the-art results on SimpleQuestions-Wiki and highest F1 score on LCQuAD 1.0 (among models not using gold entities), while getting slightly lower result on the WebQSP dataset. Finally, we demonstrate SPARKLE's fast inference speed and its ability to adapt when the knowledge base differs between the training and inference stages.
翻译:现有的知识库问答方法通常依赖于多阶段流程,包括实体链接、子图检索和查询结构生成等任务。然而,多阶段方法依赖于前序步骤的准确性,容易导致级联错误并增加推理时间。尽管已有少数研究探索端到端模型的使用,但这些模型往往存在准确率较低的问题,且生成的查询可能因底层数据不支持而无法执行。此外,大多数先前方法局限于静态训练数据,可能忽略了知识库随时间的动态演化特性。为应对这些挑战,我们提出了一种新颖的端到端自然语言到SPARQL框架——SPARKLE。该框架的显著特点是在解码过程中直接利用知识库的结构,从而将知识有效整合到查询生成中。我们的研究表明,仅在推理阶段引用知识库就能显著减少不可执行查询的生成。SPARKLE在SimpleQuestions-Wiki数据集上取得了最新的最优结果,在LCQuAD 1.0数据集上获得了最高F1值(在不使用黄金实体的模型中),而在WebQSP数据集上的结果略低。最后,我们验证了SPARKLE具备快速推理能力,并能在训练与推理阶段知识库发生变化时保持适应性。