Program induction (PI) has become a promising paradigm for using knowledge bases (KBs) to help large language models (LLMs) answer complex knowledge-intensive questions. Nonetheless, PI typically relies on a large number of parallel question-program pairs to make the LLM aware of the schema of the given KB, and is thus challenging for many low-resourced KBs that lack annotated data. To this end, we propose KB-Plugin, a plug-and-play framework that enables LLMs to induce programs over any low-resourced KB. Firstly, KB-Plugin adopts self-supervised learning to encode the detailed schema information of a given KB into a pluggable module, namely schema plugin. Secondly, KB-Plugin utilizes abundant annotated data from a rich-resourced KB to train another pluggable module, namely PI plugin, which can help the LLM extract question-relevant schema information from the schema plugin of any KB and utilize this information to induce programs over this KB. Experiments on five heterogeneous KBQA datasets show that KB-Plugin achieves better or comparable performance with 25$\times$ smaller backbone LLM compared to SoTA PI methods for low-resourced KBs, and even approaches the performance of supervised methods. Our code and data are available at https://github.com/THU-KEG/KB-Plugin.
翻译:程序归纳已成为利用知识库辅助大语言模型回答复杂知识密集型问题的重要范式。然而,程序归纳通常依赖大量平行的问题-程序对来使大语言模型认知给定知识库的架构模式,这使得该范式在缺乏标注数据的低资源知识库场景中面临挑战。为此,我们提出KB-Plugin——一种即插即用框架,能够使大语言模型在任意低资源知识库上进行程序归纳。首先,KB-Plugin采用自监督学习将给定知识库的详细模式信息编码为可插拔的"模式插件";其次,KB-Plugin利用高资源知识库的大量标注数据训练另一个可插拔的"程序归纳插件",该插件可帮助大语言模型从任意知识库的模式插件中提取与问题相关的架构信息,并据此在该知识库上完成程序归纳。在五个异构知识库问答数据集上的实验表明,与面向低资源知识库的最先进程序归纳方法相比,KB-Plugin在骨干大语言模型规模缩小25倍的情况下仍能取得更优或相当的性能,甚至接近有监督方法的表现。我们的代码及数据已开源至https://github.com/THU-KEG/KB-Plugin。