The increasing scale of large language models (LLMs) brings emergent abilities to various complex tasks requiring reasoning, such as arithmetic and commonsense reasoning. It is known that the effective design of task-specific prompts is critical for LLMs' ability to produce high-quality answers. In particular, an effective approach for complex question-and-answer tasks is example-based prompting with chain-of-thought (CoT) reasoning, which significantly improves the performance of LLMs. However, current CoT methods rely on a fixed set of human-annotated exemplars, which are not necessarily the most effective examples for different tasks. This paper proposes a new method, Active-Prompt, to adapt LLMs to different tasks with task-specific example prompts (annotated with human-designed CoT reasoning). For this purpose, we propose a solution to the key problem of determining which questions are the most important and helpful ones to annotate from a pool of task-specific queries. By borrowing ideas from the related problem of uncertainty-based active learning, we introduce several metrics to characterize the uncertainty so as to select the most uncertain questions for annotation. Experimental results demonstrate the superiority of our proposed method, achieving state-of-the-art on eight complex reasoning tasks. Further analyses of different uncertainty metrics, pool sizes, zero-shot learning, and accuracy-uncertainty relationship demonstrate the effectiveness of our method. Our code will be available at https://github.com/shizhediao/active-prompt.
翻译:随着大型语言模型(LLMs)规模的不断扩大,其在算术推理、常识推理等需要复杂推理能力的任务中展现出涌现能力。研究表明,针对特定任务设计有效的提示对于LLMs生成高质量答案至关重要。特别地,基于链式思考(CoT)推理的示例提示方法是解决复杂问答任务的有效途径,能显著提升LLMs的性能。然而,当前CoT方法依赖于固定的人工标注示例集,这些示例未必是不同任务中最有效的范例。本文提出一种新方法——Active-Prompt,通过任务特定的示例提示(辅以人工设计的CoT推理)使LLMs适应不同任务。为此,我们针对从任务特定查询池中确定哪些问题最需要且最有价值进行标注这一关键问题提出解决方案。借鉴不确定性驱动主动学习的相关问题思想,我们引入多种度量标准来表征不确定性,从而选择最不确定的问题进行标注。实验结果表明,所提方法在八个复杂推理任务上均达到最先进水平,展现了优越性。进一步对不同不确定性度量指标、池规模、零样本学习以及准确率-不确定性关系的分析,验证了我们方法的有效性。相关代码将开源在https://github.com/shizhediao/active-prompt。