This report examines the effectiveness of Chain-of-Thought (CoT) prompting in improving the multi-step reasoning abilities of large language models (LLMs). Inspired by previous studies \cite{Min2022RethinkingWork}, we analyze the impact of three types of CoT prompt perturbations, namely CoT order, CoT values, and CoT operators on the performance of GPT-3 on various tasks. Our findings show that incorrect CoT prompting leads to poor performance on accuracy metrics. Correct values in the CoT is crucial for predicting correct answers. Moreover, incorrect demonstrations, where the CoT operators or the CoT order are wrong, do not affect the performance as drastically when compared to the value based perturbations. This research deepens our understanding of CoT prompting and opens some new questions regarding the capability of LLMs to learn reasoning in context.
翻译:本报告研究了链式思维(CoT)提示在提升大型语言模型(LLMs)多步推理能力方面的有效性。受先前研究\cite{Min2022RethinkingWork}启发,我们分析了三种CoT提示扰动——即CoT顺序、CoT数值和CoT运算符——对GPT-3在各项任务中表现的影响。我们的发现表明,错误的CoT提示会导致准确率指标上的性能低下。CoT中的正确数值对预测正确答案至关重要。此外,当CoT运算符或CoT顺序发生错误时,错误演示对性能的影响远不如基于数值的扰动那么显著。本研究加深了我们对CoT提示的理解,并围绕LLMs在上下文中学习推理的能力提出了一些新问题。