The task of Question Generation over Knowledge Bases (KBQG) aims to convert a logical form into a natural language question. For the sake of expensive cost of large-scale question annotation, the methods of KBQG under low-resource scenarios urgently need to be developed. However, current methods heavily rely on annotated data for fine-tuning, which is not well-suited for few-shot question generation. The emergence of Large Language Models (LLMs) has shown their impressive generalization ability in few-shot tasks. Inspired by Chain-of-Thought (CoT) prompting, which is an in-context learning strategy for reasoning, we formulate KBQG task as a reasoning problem, where the generation of a complete question is splitted into a series of sub-question generation. Our proposed prompting method KQG-CoT first retrieves supportive logical forms from the unlabeled data pool taking account of the characteristics of the logical form. Then, we write a prompt to explicit the reasoning chain of generating complicated questions based on the selected demonstrations. To further ensure prompt quality, we extend KQG-CoT into KQG-CoT+ via sorting the logical forms by their complexity. We conduct extensive experiments over three public KBQG datasets. The results demonstrate that our prompting method consistently outperforms other prompting baselines on the evaluated datasets. Remarkably, our KQG-CoT+ method could surpass existing few-shot SoTA results of the PathQuestions dataset by 18.25, 10.72, and 10.18 absolute points on BLEU-4, METEOR, and ROUGE-L, respectively.
翻译:知识库问题生成(KBQG)任务旨在将逻辑形式转换为自然语言问题。由于大规模问题标注成本高昂,亟需开发低资源场景下的KBQG方法。然而,当前方法严重依赖标注数据进行微调,难以适应少样本问题生成场景。大语言模型(LLMs)的出现展现了其在少样本任务中卓越的泛化能力。受链式思维(CoT)提示——一种用于推理的上下文学习策略——的启发,我们将KBQG任务建模为推理问题,将完整问题的生成分解为一系列子问题的生成。我们提出的提示方法KQG-CoT首先从无标注数据池中检索支持性逻辑形式,并充分考虑逻辑形式的特性。随后,我们基于选定的示例编写提示,以显式构建生成复杂问题的推理链。为进一步确保提示质量,我们通过按逻辑形式的复杂度排序,将KQG-CoT扩展为KQG-CoT+。我们在三个公开KBQG数据集上进行了大量实验。结果表明,我们的提示方法在所有评估数据集上均持续优于其他提示基线方法。值得注意的是,我们的KQG-CoT+方法在PathQuestions数据集上,BLEU-4、METEOR和ROUGE-L分别以18.25、10.72和10.18的绝对分值超越了现有少样本最先进(SoTA)结果。