Although chain-of-thought (CoT) prompting combined with language models has achieved encouraging results on complex reasoning tasks, the naive greedy decoding used in CoT prompting usually causes the repetitiveness and local optimality. To address this shortcoming, ensemble-optimization tries to obtain multiple reasoning paths to get the final answer assembly. However, current ensemble-optimization methods either simply employ rule-based post-processing such as \textit{self-consistency}, or train an additional model based on several task-related human annotations to select the best one among multiple reasoning paths, yet fail to generalize to realistic settings where the type of input questions is unknown or the answer format of reasoning paths is unknown. To avoid their limitations, we propose \textbf{Self-Agreement}, a generalizable ensemble-optimization method applying in almost all scenarios where the type of input questions and the answer format of reasoning paths may be known or unknown. Self-agreement firstly samples from language model's decoder to generate a \textit{diverse} set of reasoning paths, and subsequently prompts the language model \textit{one more time} to determine the optimal answer by selecting the most \textit{agreed} answer among the sampled reasoning paths. Self-agreement simultaneously achieves remarkable performance on six public reasoning benchmarks and superior generalization capabilities.
翻译:尽管思维链(CoT)提示与语言模型相结合已在复杂推理任务上取得了鼓舞人心的成果,但CoT提示中使用的简单贪心解码通常会导致重复性和局部最优性。为克服这一缺陷,集成优化方法尝试获取多条推理路径以进行最终答案聚合。然而,当前的集成优化方法要么仅采用基于规则的后处理(如“自我一致性”),要么依赖若干任务相关的人工标注训练额外模型来从多条推理路径中选择最佳路径,却难以泛化到输入问题类型未知或推理路径答案格式未知的现实场景。为避免这些局限,我们提出**自我一致性**——一种可泛化的集成优化方法,适用于输入问题类型和推理路径答案格式可能已知或未知的几乎所有场景。该方法首先从语言模型解码器中进行采样以生成一组**多样化**的推理路径,随后**再次**提示语言模型,通过选择采样推理路径中**共识度最高**的答案来确定最优解。自我一致性方法在六个公开推理基准测试中同时取得了卓越性能,并展现出优异的泛化能力。