The task of Question Generation over Knowledge Bases (KBQG) aims to convert a logical form into a natural language question. For the sake of expensive cost of large-scale question annotation, the methods of KBQG under low-resource scenarios urgently need to be developed. However, current methods heavily rely on annotated data for fine-tuning, which is not well-suited for few-shot question generation. The emergence of Large Language Models (LLMs) has shown their impressive generalization ability in few-shot tasks. Inspired by Chain-of-Thought (CoT) prompting, which is an in-context learning strategy for reasoning, we formulate KBQG task as a reasoning problem, where the generation of a complete question is splitted into a series of sub-question generation. Our proposed prompting method KQG-CoT first retrieves supportive logical forms from the unlabeled data pool taking account of the characteristics of the logical form. Then, we write a prompt to explicit the reasoning chain of generating complicated questions based on the selected demonstrations. To further ensure prompt quality, we extend KQG-CoT into KQG-CoT+ via sorting the logical forms by their complexity. We conduct extensive experiments over three public KBQG datasets. The results demonstrate that our prompting method consistently outperforms other prompting baselines on the evaluated datasets. Remarkably, our KQG-CoT+ method could surpass existing few-shot SoTA results of the PathQuestions dataset by 18.25, 10.72, and 10.18 absolute points on BLEU-4, METEOR, and ROUGE-L, respectively.
翻译:知识库问题生成(KBQG)任务旨在将逻辑形式转化为自然语言问题。由于大规模问题标注成本高昂,低资源场景下的KBQG方法亟待发展。然而,当前方法严重依赖标注数据进行微调,难以适用于小样本问题生成。大规模语言模型(LLMs)的出现展现了其在小样本任务中卓越的泛化能力。受思维链(CoT)提示(一种用于推理的上下文学习策略)的启发,我们将KBQG任务建模为推理问题,将完整问题的生成分解为一系列子问题的生成。我们提出的提示方法KQG-CoT首先从未标注数据池中检索支持性逻辑形式,并考虑逻辑形式的特点。随后,基于选定的示例编写提示,以显式化生成复杂问题的推理链。为进一步保证提示质量,我们通过按逻辑形式复杂度排序的方式,将KQG-CoT扩展为KQG-CoT+。我们在三个公开KBQG数据集上进行了大量实验。结果表明,我们的提示方法在所有评估数据集上均优于其他提示基线。值得注意的是,在PathQuestions数据集上,我们的KQG-CoT+方法在BLEU-4、METEOR和ROUGE-L指标上分别以18.25、10.72和10.18的绝对分值超越了现有小样本最先进(SoTA)结果。