In enhancing the reasoning capabilities of large language models (LLMs), prior research primarily focuses on specific prompting techniques such as few-shot or zero-shot chain-of-thought (CoT) prompting. These methods, while effective, often involve manually intensive prompt engineering. Our study takes a novel approach by asking: Can LLMs reason effectively without prompting? Our findings reveal that, intriguingly, CoT reasoning paths can be elicited from pre-trained LLMs by simply altering the \textit{decoding} process. Rather than conventional greedy decoding, we investigate the top-$k$ alternative tokens, uncovering that CoT paths are frequently inherent in these sequences. This approach not only bypasses the confounders of prompting but also allows us to assess the LLMs' \textit{intrinsic} reasoning abilities. Moreover, we observe that the presence of a CoT in the decoding path correlates with a higher confidence in the model's decoded answer. This confidence metric effectively differentiates between CoT and non-CoT paths. Extensive empirical studies on various reasoning benchmarks show that the proposed CoT-decoding substantially outperforms the standard greedy decoding.
翻译:在增强大型语言模型(LLMs)推理能力的研究中,先前工作主要聚焦于特定提示技术,如少样本或零样本链式推理提示。这些方法虽有效,但往往需要大量人工提示工程。本研究提出全新视角:LLMs能否无需提示即实现有效推理?我们发现,有趣的是,通过简单改变解码过程即可从预训练LLMs中激发链式推理路径。不同于传统贪婪解码,我们探索了top-$k$候选词元,发现链式推理路径常隐含于这些序列中。该方法不仅规避了提示的混杂干扰,更能评估LLMs的固有推理能力。此外,我们观察到解码路径中出现链式推理与模型对解码答案的更高置信度相关。这一置信度指标可有效区分链式推理与非链式推理路径。基于多项推理基准的大规模实证研究表明,所提出的链式解码方法显著优于标准贪婪解码。