In recent years, the field of neural machine translation (NMT) for SPARQL query generation has witnessed significant growth. Incorporating the copy mechanism with traditional encoder-decoder architectures and using pre-trained encoder-decoders and large language models have set new performance benchmarks. This paper presents various experiments that replicate and expand upon recent NMT-based SPARQL generation studies, comparing pre-trained language models (PLMs), non-pre-trained language models (NPLMs), and large language models (LLMs), highlighting the impact of question annotation and the copy mechanism and testing various fine-tuning methods using LLMs. In particular, we provide a systematic error analysis of the models and test their generalization ability. Our study demonstrates that the copy mechanism yields significant performance enhancements for most PLMs and NPLMs. Annotating the data is pivotal to generating correct URIs, with the "tag-within" strategy emerging as the most effective approach. Additionally, our findings reveal that the primary source of errors stems from incorrect URIs in SPARQL queries that are sometimes replaced with hallucinated URIs when using base models. This does not happen using the copy mechanism, but it sometimes leads to selecting wrong URIs among candidates. Finally, the performance of the tested LLMs fell short of achieving the desired outcomes.
翻译:近年来,面向SPARQL查询生成的神经机器翻译领域经历了显著发展。将复制机制与传统编码器-解码器架构相结合,以及使用预训练编码器-解码器和大型语言模型,已树立了新的性能基准。本文通过各类实验复现并扩展了近期基于神经机器翻译的SPARQL生成研究,比较了预训练语言模型(PLM)、非预训练语言模型(NPLM)和大型语言模型(LLM),着重分析了问题标注与复制机制的影响,并测试了多种基于LLM的微调方法。我们特别对模型进行了系统化错误分析,并检验了其泛化能力。研究表明,复制机制能显著提升大多数PLM和NPLM的性能。数据标注对生成正确URI至关重要,其中"标签内嵌"策略被证实为最有效方法。此外,我们发现错误的主要根源在于SPARQL查询中出现的错误URI——当使用基础模型时,这些URI有时会被虚构的URI替代。虽然采用复制机制不会出现此情况,但有时会导致在候选URI中选择错误项。最后,所测试的LLM性能未能达到预期效果。