Large language models (LLMs) often exhibit deficient reasoning or generate hallucinations. To address these, studies prefixed with "Self-" such as Self-Consistency, Self-Improve, and Self-Refine have been initiated. They share a commonality: involving LLMs evaluating and updating themselves. Nonetheless, these efforts lack a unified perspective on summarization, as existing surveys predominantly focus on categorization. In this paper, we summarize a theoretical framework, Internal Consistency, offering explanations for reasoning deficiencies and hallucinations. Internal Consistency refers to the consistency in expressions among LLMs' latent, decoding, or response layers based on sampling methodologies. Then, we introduce another effective theoretical framework capable of mining Internal Consistency, named Self-Feedback. This framework consists of two modules: Self-Evaluation and Self-Update. The former captures Internal Consistency Signals, while the latter leverages the signals to enhance either the model's response or the model itself. This framework has been employed in numerous studies. We systematically classify these studies by tasks and lines of work; summarize relevant evaluation methods and benchmarks; and delve into the concern, "Does Self-Feedback Really Work?" We also propose several critical viewpoints, including the "Hourglass Evolution of Internal Consistency", "Consistency Is (Almost) Correctness" hypothesis, and "The Paradox of Latent and Explicit Reasoning". The relevant resources are open-sourced at https://github.com/IAAR-Shanghai/ICSFSurvey.
翻译:大型语言模型(LLMs)常表现出推理缺陷或产生幻觉。为应对这些问题,以"自"为前缀的研究(如自一致性、自改进、自优化)相继展开。这些研究具有共同点:均涉及LLMs对自身进行评估与更新。然而,现有研究缺乏统一的总结视角,当前综述主要侧重于分类归纳。本文提出一个理论框架——内部一致性,用于解释推理缺陷与幻觉现象。内部一致性指基于采样方法,LLMs在潜在层、解码层或响应层的表达一致性。进而,我们引入另一个能挖掘内部一致性的有效理论框架,称为自反馈。该框架包含自评估与自更新两大模块:前者捕获内部一致性信号,后者利用这些信号优化模型响应或模型本身。该框架已在众多研究中得到应用。我们按任务类型与研究脉络对这些研究进行系统分类;总结相关评估方法与基准测试;并深入探讨"自反馈是否真正有效"这一核心问题。同时提出若干关键观点,包括"内部一致性的沙漏演化"、"一致性即(近似)正确性"假说以及"潜在推理与显式推理悖论"。相关资源已开源发布于https://github.com/IAAR-Shanghai/ICSFSurvey。