Chain-of-thought prompting (CoT) advances the reasoning abilities of large language models (LLMs) and achieves superior performance in arithmetic, commonsense, and symbolic reasoning tasks. However, most CoT studies rely on carefully designed human-annotated rational chains to prompt the language model, which poses challenges for real-world applications where labeled training data is available without human-annotated rational chains. This creates barriers to applications of CoT prompting to these general tasks. This paper proposes a new strategy, Automate-CoT (Automatic Prompt Augmentation and Selection with Chain-of-Thought), that can bypass human engineering of CoTs by automatically augmenting rational chains from a small labeled dataset, and then pruning low-quality chains to construct a candidate pool of machine-generated rationale chains based on the labels. Finally, it selects the optimal combination of several rationale chains from the pool for CoT prompting by employing a variance-reduced policy gradient strategy to estimate the significance of each example in a black-box language model. Automate-CoT enables a quick adaptation of the CoT technique to different tasks. Experimental results demonstrate the effectiveness of our method, where state-of-the-art results are achieved on arithmetic reasoning (+2.7\%), commonsense reasoning (+3.4\%), symbolic reasoning (+3.2\%), and non-reasoning tasks (+2.5\%). Our code will be available at https://github.com/shizhediao/automate-cot.
翻译:思维链提示(Chain-of-Thought prompting,CoT)通过增强大语言模型(LLMs)的推理能力,在算术、常识推理和符号推理任务中取得了卓越性能。然而,现有CoT研究大多依赖于精心设计的人工标注推理链来引导语言模型,这在实际应用中面临挑战——当标注训练数据可用但缺乏人工标注的推理链时,难以直接应用。这一局限阻碍了CoT提示在通用任务中的推广。本文提出新策略Automate-CoT(基于思维链的自动提示增强与选择),能够绕过人工构建CoT的过程:首先从少量标注数据集中自动生成推理链,随后基于标签剔除低质量推理链,构建机器生成的推理链候选池;最后采用方差缩减策略梯度方法,在黑箱语言模型中评估每个示例的重要性,从候选池中选择最优的推理链组合用于CoT提示。Automate-CoT实现了CoT技术对不同任务的快速适配。实验结果表明了该方法的有效性,在算术推理(+2.7%)、常识推理(+3.4%)、符号推理(+3.2%)及非推理任务(+2.5%)上均取得了当前最优结果。相关代码将发布于https://github.com/shizhediao/automate-cot。