Chain-of-thought (CoT) advances the reasoning abilities of large language models (LLMs) and achieves superior performance in complex reasoning tasks. However, most CoT studies rely on carefully designed human-annotated rational chains to prompt LLMs, posing challenges for real-world applications where labeled data is available without rational chains. This paper proposes a new strategy, Automate-CoT (Automatic Prompt Augmentation and Selection with Chain-of-Thought), that can bypass human engineering of CoT by automatically augmenting rational chains from a small labeled dataset, and then pruning low-quality chains to construct a candidate pool of machine-generated rationale chains based on the labels. Finally, it selects the optimal combination of several rationale chains from the pool for CoT prompting by employing a variance-reduced policy gradient strategy to estimate the significance of each example. Automate-CoT enables a quick adaptation of the CoT technique to different tasks. Experimental results demonstrate the effectiveness of our method, where competitive results are achieved on arithmetic reasoning (+2.7%), commonsense reasoning (+3.4%), symbolic reasoning (+3.2%), and non-reasoning tasks (+2.5%). The code is available at https://github.com/SHUMKASHUN/Automate-CoT.
翻译:链式思维(Chain-of-Thought, CoT)提升了大型语言模型(LLMs)的推理能力,并在复杂推理任务中取得了优异性能。然而,大多数CoT研究依赖精心设计的人工标注推理链来提示LLMs,这在缺乏推理链但存在标注数据的实际应用场景中构成挑战。本文提出一种新策略——Automate-CoT(基于链式思维的自动提示增强与选择),该策略通过从小型标注数据集中自动扩充推理链,然后根据标签修剪低质量链以构建机器生成推理链的候选池。最后,采用方差缩减策略梯度方法估计每个示例的重要性,从池中选择若干推理链的最优组合用于CoT提示。Automate-CoT使CoT技术能够快速适应不同任务。实验结果表明了该方法的有效性:在算术推理(+2.7%)、常识推理(+3.4%)、符号推理(+3.2%)及非推理任务(+2.5%)上均取得了具有竞争力的结果。代码可在 https://github.com/SHUMKASHUN/Automate-CoT 获取。