Large language models (LLMs) have attracted significant attention for their exceptional abilities in various natural language processing tasks, but they suffer from hallucinations that will cause performance degradation. One promising solution to improve the LLMs' performance is to ask LLMs to revise their answer after generation, a technique known as self-correction. Among the two types of self-correction, intrinsic self-correction is considered a promising direction because it does not utilize external knowledge. However, recent works doubt the validity of LLM's ability to conduct intrinsic self-correction. In this paper, we present a novel perspective on the intrinsic self-correction capabilities of LLMs through theoretical analyses and empirical experiments. In addition, we identify two critical factors for successful self-correction: zero temperature and fair prompts. Leveraging these factors, we demonstrate that intrinsic self-correction ability is exhibited across multiple existing LLMs. Our findings offer insights into the fundamental theories underlying the self-correction behavior of LLMs and remark on the importance of unbiased prompts and zero temperature settings in harnessing their full potential.
翻译:大型语言模型(LLMs)因其在各类自然语言处理任务中的卓越能力而备受关注,但其存在的幻觉问题会导致性能下降。提升LLM性能的一种有效方案是要求模型在生成答案后进行自我修订,该技术被称为自我修正。在两类自我修正方法中,内在自我修正因其无需借助外部知识而被视为具有前景的研究方向。然而,近期研究对LLM执行内在自我修正的有效性提出了质疑。本文通过理论分析与实证实验,对LLM的内在自我修正能力提出了新的见解。此外,我们识别了实现成功自我修正的两个关键因素:零温度设置与公平提示。基于这些因素,我们证明了内在自我修正能力在多个现有LLM中普遍存在。本研究为理解LLM自我修正行为的基础理论提供了新视角,并强调了无偏提示与零温度设置在充分挖掘模型潜力方面的重要性。