Large language models (LLMs) often exhibit deficient reasoning or generate hallucinations. To address these, studies prefixed with "Self-" such as Self-Consistency, Self-Improve, and Self-Refine have been initiated. They share a commonality: involving LLMs evaluating and updating themselves. Nonetheless, these efforts lack a unified perspective on summarization, as existing surveys predominantly focus on categorization. In this paper, we use a unified perspective of internal consistency, offering explanations for reasoning deficiencies and hallucinations. Internal consistency refers to the consistency in expressions among LLMs' latent, decoding, or response layers based on sampling methodologies. Then, we introduce an effective theoretical framework capable of mining internal consistency, named Self-Feedback. This framework consists of two modules: Self-Evaluation and Self-Update. The former captures internal consistency signals, while the latter leverages the signals to enhance either the model's response or the model itself. This framework has been employed in numerous studies. We systematically classify these studies by tasks and lines of work; summarize relevant evaluation methods and benchmarks; and delve into the concern, "Does Self-Feedback Really Work?" We also propose several critical viewpoints, including the "Hourglass Evolution of Internal Consistency", "Consistency Is (Almost) Correctness" hypothesis, and "The Paradox of Latent and Explicit Reasoning". The relevant resources are open-sourced at https://github.com/IAAR-Shanghai/ICSFSurvey.
翻译:大语言模型(LLMs)常表现出推理缺陷或产生幻觉。为解决这些问题,以"自"为前缀的研究如自一致性、自改进和自优化相继展开。这些研究具有共同点:均涉及大语言模型对自身的评估与更新。然而,现有研究缺乏统一的总结视角,当前综述主要集中于分类归纳。本文从内部一致性的统一视角出发,为推理缺陷和幻觉现象提供解释。内部一致性指基于采样方法在大语言模型的潜在层、解码层或响应层之间表达的一致性。随后,我们提出一种能够挖掘内部一致性的有效理论框架,称为自反馈。该框架包含两个模块:自评估与自更新。前者捕捉内部一致性信号,后者利用这些信号优化模型响应或模型本身。该框架已在众多研究中得到应用。我们按任务类型和研究脉络对这些研究进行系统分类;总结相关评估方法与基准测试;并深入探讨"自反馈是否真正有效?"这一核心问题。同时提出若干关键观点,包括"内部一致性的沙漏演化"、"一致性即(近似)正确性"假说以及"潜在推理与显式推理的悖论"。相关资源已在 https://github.com/IAAR-Shanghai/ICSFSurvey 开源。