The increasing scale of large language models (LLMs) brings emergent abilities to various complex tasks requiring reasoning, such as arithmetic and commonsense reasoning. It is known that the effective design of task-specific prompts is critical for LLMs' ability to produce high-quality answers. In particular, an effective approach for complex question-and-answer tasks is example-based prompting with chain-of-thought (CoT) reasoning, which significantly improves the performance of LLMs. However, current CoT methods rely on a fixed set of human-annotated exemplars, which are not necessarily the most effective examples for different tasks. This paper proposes a new method, Active-Prompt, to adapt LLMs to different tasks with task-specific example prompts (annotated with human-designed CoT reasoning). For this purpose, we propose a solution to the key problem of determining which questions are the most important and helpful ones to annotate from a pool of task-specific queries. By borrowing ideas from the related problem of uncertainty-based active learning, we introduce several metrics to characterize the uncertainty so as to select the most uncertain questions for annotation. Experimental results demonstrate the superiority of our proposed method, achieving state-of-the-art on eight complex reasoning tasks. Further analyses of different uncertainty metrics, pool sizes, zero-shot learning, and accuracy-uncertainty relationship demonstrate the effectiveness of our method. Our code will be available at https://github.com/shizhediao/active-prompt.
翻译:随着大语言模型规模的扩大,算术推理和常识推理等需要推理能力的复杂任务中涌现出新的能力。研究表明,针对特定任务设计有效的提示框架对于大语言模型生成高质量答案至关重要。特别地,针对复杂问答任务的有效方法是采用基于示例的链式思维推理提示,该方法能显著提升大语言模型性能。然而,当前链式思维方法依赖固定的人工标注示例集,这些示例对不同任务而言未必是最优选择。本文提出新型方法Active-Prompt,通过任务特定的示例提示(包含人工设计的链式思维推理)使大语言模型适应不同任务。为此,我们提出了一个关键问题的解决方案:如何从特定任务的问题池中确定最需要且最有帮助标注的问题。通过借鉴基于不确定性的主动学习相关思想,我们引入多种不确定性度量指标,从而筛选出最需标注的问题。实验结果表明,所提方法在八个复杂推理任务上取得最优性能。进一步分析不同不确定性度量指标、池规模、零样本学习以及精度-不确定性关系,验证了方法的有效性。我们将在https://github.com/shizhediao/active-prompt 公开代码。