Large language models (LLMs) are restricted to reason in the "language space", where they typically express the reasoning process with a chain-of-thought (CoT) to solve a complex reasoning problem. However, we argue that language space may not always be optimal for reasoning. For example, most word tokens are primarily for textual coherence and not essential for reasoning, while some critical tokens require complex planning and pose huge challenges to LLMs. To explore the potential of LLM reasoning in an unrestricted latent space instead of using natural language, we introduce a new paradigm Coconut (Chain of Continuous Thought). We utilize the last hidden state of the LLM as a representation of the reasoning state (termed "continuous thought"). Rather than decoding this into a word token, we feed it back to the LLM as the subsequent input embedding directly in the continuous space. Experiments show that Coconut can effectively augment the LLM on several reasoning tasks. This novel latent reasoning paradigm leads to emergent advanced reasoning patterns: the continuous thought can encode multiple alternative next reasoning steps, allowing the model to perform a breadth-first search (BFS) to solve the problem, rather than prematurely committing to a single deterministic path like CoT. Coconut outperforms CoT in certain logical reasoning tasks that require substantial backtracking during planning, with fewer thinking tokens during inference. These findings demonstrate the promise of latent reasoning and offer valuable insights for future research.
翻译:大语言模型(LLMs)被限制在“语言空间”中进行推理,通常通过思维链(CoT)来表达推理过程以解决复杂问题。然而,我们认为语言空间可能并非总是最优的推理媒介。例如,大多数词符主要用于文本连贯性而非推理本质,而某些关键词符需要复杂规划并对LLMs构成巨大挑战。为探索LLM在非受限潜在空间(而非自然语言)中进行推理的潜力,我们提出新范式Coconut(连续思维链)。我们利用LLM的最终隐藏状态作为推理状态的表征(称为“连续思维”)。我们并非将其解码为词符,而是直接在连续空间中将其作为后续输入嵌入反馈给LLM。实验表明,Coconut能在多项推理任务中有效增强LLM性能。这种新颖的潜在推理范式催生了高级推理模式:连续思维可编码多个备选推理步骤,使模型能够执行广度优先搜索(BFS)来解决问题,而非像CoT那样过早确定单一路径。在需要大量回溯规划的逻辑推理任务中,Coconut以更少的推理词符优于CoT。这些发现证明了潜在推理的潜力,并为未来研究提供了重要见解。