Knowledge Graph Question Answering (KGQA) simplifies querying vast amounts of knowledge stored in a graph-based model using natural language. However, the research has largely concentrated on English, putting non-English speakers at a disadvantage. Meanwhile, existing multilingual KGQA systems face challenges in achieving performance comparable to English systems, highlighting the difficulty of generating SPARQL queries from diverse languages. In this research, we propose a simplified approach to enhance multilingual KGQA systems by incorporating linguistic context and entity information directly into the processing pipeline of a language model. Unlike existing methods that rely on separate encoders for integrating auxiliary information, our strategy leverages a single, pretrained multilingual transformer-based language model to manage both the primary input and the auxiliary data. Our methodology significantly improves the language model's ability to accurately convert a natural language query into a relevant SPARQL query. It demonstrates promising results on the most recent QALD datasets, namely QALD-9-Plus and QALD-10. Furthermore, we introduce and evaluate our approach on Chinese and Japanese, thereby expanding the language diversity of the existing datasets.
翻译:知识图谱问答(KGQA)通过自然语言简化了对图模型存储的海量知识的查询过程。然而,现有研究主要集中于英语,使得非英语使用者处于不利地位。同时,现有的多语言KGQA系统在实现与英语系统相当的性能方面面临挑战,这凸显了从不同语言生成SPARQL查询的困难。在本研究中,我们提出了一种简化方法,通过将语言上下文和实体信息直接整合到语言模型的处理流程中,以增强多语言KGQA系统。与现有依赖独立编码器整合辅助信息的方法不同,我们的策略利用单一的预训练多语言基于Transformer的语言模型来同时处理主要输入和辅助数据。该方法显著提升了语言模型将自然语言查询准确转换为相关SPARQL查询的能力。在最新的QALD数据集(即QALD-9-Plus和QALD-10)上取得了显著成果。此外,我们针对中文和日文引入并评估了该方法,从而扩展了现有数据集的语言多样性。