With the help of Chain-of-Thought (CoT) prompting, Large Language Models (LLMs) have achieved remarkable performance on various reasoning tasks. However, most of them have been evaluated under noise-free context and the dilemma for LLMs to produce inaccurate results under the noisy context has not been fully investigated. Existing studies utilize trigger sentences to encourage LLMs to concentrate on the relevant information but the trigger has limited effect on final answer prediction. Inspired by interactive CoT method, where intermediate reasoning steps are promoted by multiple rounds of interaction between users and LLMs, we propose a novel prompting method, namely R$^3$ prompting, for CoT reasoning under noisy context. Specifically, R$^3$ prompting interacts with LLMs to perform key sentence extraction, variable declaration and answer prediction, which corresponds to a thought process of reviewing, rephrasing and resolving. The responses generated at the last interaction will perform as hints to guide toward the responses of the next interaction. Our experiments show that R$^3$ prompting significantly outperforms existing CoT prompting methods on five reasoning tasks under noisy context. With GPT-3.5-turbo, we observe 3.7% accuracy improvement on average on the reasoning tasks under noisy context compared to the most competitive prompting baseline. More analyses and ablation studies show the robustness and generalization of R$^3$ prompting method in solving reasoning tasks in LLMs under noisy context.
翻译:借助链式思维(Chain-of-Thought, CoT)提示方法,大语言模型(Large Language Models, LLMs)在各类推理任务中取得了显著性能。然而,当前研究大多在无噪声环境下进行评估,对于LLMs在噪声环境下产生不准确结果的困境尚未得到充分探索。现有研究通过触发语句鼓励LLMs聚焦相关信息,但触发机制对最终答案预测的效果有限。受交互式CoT方法(通过用户与LLMs的多轮交互促进中间推理步骤)启发,我们提出一种新型提示方法——R$^3$ Prompting,用于噪声环境下的CoT推理。具体而言,R$^3$ Prompting通过与LLMs交互执行关键句提取、变量声明和答案预测,分别对应“审查、重述与解决”的思维过程。上一轮交互生成的响应将作为提示,引导下一轮交互的响应方向。实验表明,在噪声环境下,R$^3$ Prompting在五个推理任务上显著优于现有CoT提示方法。使用GPT-3.5-turbo时,相较于最具竞争力的提示基线,噪声环境下推理任务的平均准确率提升3.7%。进一步的分析与消融研究证明了R$^3$ Prompting方法在噪声环境下解决LLMs推理任务时的鲁棒性与泛化能力。