Large language models (LLMs) have recently been shown to deliver impressive performance in various NLP tasks. To tackle multi-step reasoning tasks, few-shot chain-of-thought (CoT) prompting includes a few manually crafted step-by-step reasoning demonstrations which enable LLMs to explicitly generate reasoning steps and improve their reasoning task accuracy. To eliminate the manual effort, Zero-shot-CoT concatenates the target problem statement with "Let's think step by step" as an input prompt to LLMs. Despite the success of Zero-shot-CoT, it still suffers from three pitfalls: calculation errors, missing-step errors, and semantic misunderstanding errors. To address the missing-step errors, we propose Plan-and-Solve (PS) Prompting. It consists of two components: first, devising a plan to divide the entire task into smaller subtasks, and then carrying out the subtasks according to the plan. To address the calculation errors and improve the quality of generated reasoning steps, we extend PS prompting with more detailed instructions and derive PS+ prompting. We evaluate our proposed prompting strategy on ten datasets across three reasoning problems. The experimental results over GPT-3 show that our proposed zero-shot prompting consistently outperforms Zero-shot-CoT across all datasets by a large margin, is comparable to or exceeds Zero-shot-Program-of-Thought Prompting, and has comparable performance with 8-shot CoT prompting on the math reasoning problem. The code can be found at https://github.com/AGI-Edgerunners/Plan-and-Solve-Prompting.
翻译:大语言模型(LLMs)近期在各类自然语言处理任务中展现了卓越性能。为解决多步推理任务,少样本思维链(CoT)提示通过人工构建逐步推理范例,使LLMs能够显式生成推理步骤并提升推理任务准确率。为消除人工操作,零样本思维链(Zero-shot-CoT)将目标问题陈述与“让我们一步步思考”拼接作为LLMs的输入提示。尽管零样本思维链取得成效,但仍存在三类缺陷:计算错误、步骤缺失错误和语义理解错误。针对步骤缺失错误,我们提出规划与求解(PS)提示方法。该方法包含两个组成部分:首先制定将整体任务分解为更小子任务的计划,然后根据计划执行子任务。为解决计算错误并提升生成推理步骤的质量,我们通过更详细的指令扩展了PS提示,得到PS+提示。我们在三个推理问题的十个数据集上评估了所提提示策略。基于GPT-3的实验结果表明,我们提出的零样本提示在所有数据集上均显著优于零样本思维链,与零样本程序思维提示性能相当或更优,并在数学推理问题上与8样本思维链提示性能接近。代码见https://github.com/AGI-Edgerunners/Plan-and-Solve-Prompting。