Open-Domain Question Answering (ODQA) aims at answering factoid questions without explicitly providing specific background documents. In a zero-shot setting, this task is more challenging since no data is available to train customized models like Retriever-Readers. Recently, Large Language Models (LLMs) like GPT-3 have shown their power in zero-shot ODQA with direct prompting methods, but these methods are still far from releasing the full powerfulness of LLMs only in an implicitly invoking way. In this paper, we propose a Self-Prompting framework to explicitly utilize the massive knowledge stored in the parameters of LLMs and their strong instruction understanding abilities. Concretely, we prompt LLMs step by step to generate multiple pseudo QA pairs with background passages and explanations from scratch and then use those generated elements for in-context learning. Experimental results show our method surpasses previous SOTA methods significantly on three widely-used ODQA datasets, and even achieves comparable performance with some Retriever-Reader models fine-tuned on full training data.
翻译:开放域问答(ODQA)旨在不明确提供特定背景文档的情况下回答事实类问题。在零样本场景下,该任务更具挑战性,因为缺乏训练定制化模型(如检索-阅读器)的数据。近年来,诸如GPT-3之类的大型语言模型(LLMs)通过直接提示方法在零样本ODQA中展现了其能力,但这些方法仍远未以显式调用的方式释放LLMs的全部潜力。本文提出了一种自提示框架,以显式利用LLMs参数中存储的海量知识及其强大的指令理解能力。具体而言,我们逐步提示LLMs从头生成多个带背景段落和解释的伪问答对,然后利用这些生成元素进行语境内学习。实验结果表明,我们的方法在三个广泛使用的ODQA数据集上显著超越了现有最先进方法,甚至在性能上达到了与在完整训练数据上微调的某些检索-阅读器模型相当的水平。