Large-scale pre-trained language models (PLMs) such as BERT have recently achieved great success and become a milestone in natural language processing (NLP). It is now the consensus of the NLP community to adopt PLMs as the backbone for downstream tasks. In recent works on knowledge graph question answering (KGQA), BERT or its variants have become necessary in their KGQA models. However, there is still a lack of comprehensive research and comparison of the performance of different PLMs in KGQA. To this end, we summarize two basic KGQA frameworks based on PLMs without additional neural network modules to compare the performance of nine PLMs in terms of accuracy and efficiency. In addition, we present three benchmarks for larger-scale KGs based on the popular SimpleQuestions benchmark to investigate the scalability of PLMs. We carefully analyze the results of all PLMs-based KGQA basic frameworks on these benchmarks and two other popular datasets, WebQuestionSP and FreebaseQA, and find that knowledge distillation techniques and knowledge enhancement methods in PLMs are promising for KGQA. Furthermore, we test ChatGPT, which has drawn a great deal of attention in the NLP community, demonstrating its impressive capabilities and limitations in zero-shot KGQA. We have released the code and benchmarks to promote the use of PLMs on KGQA.
翻译:大规模预训练语言模型(如BERT)近期取得了巨大成功,已成为自然语言处理领域的里程碑。采用预训练语言模型作为下游任务的基础骨干已成为学界共识。在最新知识图谱问答研究中,BERT及其变体已逐渐成为必备组件。然而,当前仍缺乏对不同预训练语言模型在知识图谱问答任务中性能的系统性研究与对比。为此,我们归纳了两种基于预训练语言模型的基础知识图谱问答框架(不含额外神经网络模块),从准确性和效率两个维度对比了九种预训练语言模型的性能。同时,基于广泛使用的SimpleQuestions基准,我们构建了三个面向更大规模知识图谱的评测基准,用于研究预训练语言模型的可扩展性。通过系统分析所有基于预训练语言模型的知识图谱问答基础框架在这些基准及WebQuestionSP、FreebaseQA两个公开数据集上的表现,我们发现知识蒸馏技术与知识增强方法对知识图谱问答任务具有显著提升潜力。此外,我们测试了近期在自然语言处理领域备受关注的ChatGPT模型,展示了其在零样本知识图谱问答中的卓越能力与局限性。我们已开源相关代码与基准数据,以推动预训练语言模型在知识图谱问答领域的应用。