Chain-of-thought prompting combined with pre-trained large language models has achieved encouraging results on complex reasoning tasks. In this paper, we propose a new decoding strategy, self-consistency, to replace the naive greedy decoding used in chain-of-thought prompting. It first samples a diverse set of reasoning paths instead of only taking the greedy one, and then selects the most consistent answer by marginalizing out the sampled reasoning paths. Self-consistency leverages the intuition that a complex reasoning problem typically admits multiple different ways of thinking leading to its unique correct answer. Our extensive empirical evaluation shows that self-consistency boosts the performance of chain-of-thought prompting with a striking margin on a range of popular arithmetic and commonsense reasoning benchmarks, including GSM8K (+17.9%), SVAMP (+11.0%), AQuA (+12.2%), StrategyQA (+6.4%) and ARC-challenge (+3.9%).
翻译:思维链提示结合预训练大型语言模型已在复杂推理任务上取得令人鼓舞的结果。本文提出一种新的解码策略——自洽性,用以替代思维链提示中使用的朴素贪心解码。该策略首先采样一组多样化的推理路径,而非仅采用贪心路径,随后通过对采样推理路径进行边缘化处理来选择最一致的答案。自洽性利用了如下直觉:复杂推理问题通常允许多种不同的思考方式,而这些方式均指向唯一正确的答案。我们的大量实证评估表明,自洽性在多个流行的算术和常识推理基准测试中显著提升了思维链提示的性能,包括GSM8K(+17.9%)、SVAMP(+11.0%)、AQuA(+12.2%)、StrategyQA(+6.4%)和ARC-challenge(+3.9%)。