Internal Consistency and Self-Feedback in Large Language Models: A Survey

Large language models (LLMs) are expected to respond accurately but often exhibit deficient reasoning or generate hallucinatory content. To address these, studies prefixed with ``Self-'' such as Self-Consistency, Self-Improve, and Self-Refine have been initiated. They share a commonality: involving LLMs evaluating and updating itself to mitigate the issues. Nonetheless, these efforts lack a unified perspective on summarization, as existing surveys predominantly focus on categorization without examining the motivations behind these works. In this paper, we summarize a theoretical framework, termed Internal Consistency, which offers unified explanations for phenomena such as the lack of reasoning and the presence of hallucinations. Internal Consistency assesses the coherence among LLMs' latent layer, decoding layer, and response layer based on sampling methodologies. Expanding upon the Internal Consistency framework, we introduce a streamlined yet effective theoretical framework capable of mining Internal Consistency, named Self-Feedback. The Self-Feedback framework consists of two modules: Self-Evaluation and Self-Update. This framework has been employed in numerous studies. We systematically classify these studies by tasks and lines of work; summarize relevant evaluation methods and benchmarks; and delve into the concern, ``Does Self-Feedback Really Work?'' We propose several critical viewpoints, including the ``Hourglass Evolution of Internal Consistency'', ``Consistency Is (Almost) Correctness'' hypothesis, and ``The Paradox of Latent and Explicit Reasoning''. Furthermore, we outline promising directions for future research. We have open-sourced the experimental code, reference list, and statistical data, available at \url{https://github.com/IAAR-Shanghai/ICSFSurvey}.

翻译：大语言模型（LLMs）被期望能给出准确回应，但常常表现出推理缺陷或生成幻觉内容。为解决这些问题，以“自”为前缀的研究，如自一致性、自改进和自精炼等，相继展开。这些研究具有一个共同点：都涉及LLM对自身进行评估和更新以缓解上述问题。然而，这些努力缺乏一个统一的总结视角，因为现有的综述主要侧重于分类，而未深入探究这些工作背后的动机。在本文中，我们总结了一个理论框架，称为“内在一致性”，它为推理能力缺失和幻觉存在等现象提供了统一的解释。内在一致性基于抽样方法，评估LLM的潜在层、解码层和回应层之间的一致性。在内在一致性框架的基础上，我们引入了一个简化而有效的理论框架，能够挖掘内在一致性，并将其命名为“自反馈”。自反馈框架包含两个模块：自评估和自更新。该框架已在众多研究中得到应用。我们按任务和研究脉络对这些研究进行了系统分类；总结了相关的评估方法和基准；并深入探讨了“自反馈是否真的有效？”这一关切。我们提出了几个关键观点，包括“内在一致性的沙漏式演化”、“一致性（几乎）等同于正确性”假说，以及“潜在推理与显式推理的悖论”。此外，我们概述了未来研究的有前景方向。我们已开源实验代码、参考文献列表和统计数据，可通过 \url{https://github.com/IAAR-Shanghai/ICSFSurvey} 获取。