Although chain-of-thought (CoT) prompting combined with language models has achieved encouraging results on complex reasoning tasks, the naive greedy decoding used in CoT prompting usually causes the repetitiveness and local optimality. To address this shortcoming, ensemble-optimization tries to obtain multiple reasoning paths to get the final answer assembly. However, current ensemble-optimization methods either simply employ rule-based post-processing such as \textit{self-consistency}, or train an additional model based on several task-related human annotations to select the best one among multiple reasoning paths, yet fail to generalize to realistic settings where the type of input questions is unknown or the answer format of reasoning paths is unknown. To avoid their limitations, we propose \textbf{self-agreement}, a generalizable ensemble-optimization method applying in almost all scenarios where the type of input questions and the answer format of reasoning paths may be known or unknown. Self-agreement firstly samples from language model's decoder to generate a \textit{diverse} set of reasoning paths, and subsequently prompts the language model \textit{one more time} to determine the optimal answer by selecting the most \textit{agreed} answer among the sampled reasoning paths. Self-agreement simultaneously achieves remarkable performance on six public reasoning benchmarks and superior generalization capabilities.
翻译:尽管链式思维提示结合语言模型在复杂推理任务上取得了令人鼓舞的结果,但链式思维提示中使用的朴素贪心解码通常会导致重复性和局部最优性。为解决这一缺陷,集成优化尝试获取多条推理路径以整合最终答案。然而,当前的集成优化方法要么简单采用基于规则的后处理(如自我一致性),要么基于若干任务相关的人工标注训练额外模型以从多条推理路径中选出最优路径,却未能泛化至输入问题类型未知或推理路径答案格式未知的实际场景。为避免这些局限,我们提出**自我一致性**——一种可泛化的集成优化方法,适用于输入问题类型与推理路径答案格式已知或未知的几乎全部场景。自我一致性首先从语言模型解码器采样生成一组**多样化**的推理路径,随后**再次**提示语言模型,通过从采样推理路径中选出最**一致**的答案来确定最优解。自我一致性在六个公开推理基准上同时实现了卓越性能与优越泛化能力。