Internal Planning in Language Models: Characterizing Horizon and Branch Awareness

The extent to which decoder-only language models (LMs) engage in planning, that is, organizing intermediate computations to support coherent long-range generation, remains an important question, with implications for interpretability, reliability, and principled model design. Planning involves structuring computations over long horizons, and considering multiple possible continuations, but how far transformer-based LMs exhibit them without external scaffolds, e.g., chain-of-thought prompting, is unclear. We address these questions by analyzing the hidden states at the core of transformer computations, which capture intermediate results and act as carriers of information. Since these hidden representations are redundant and encumbered with fine-grained details, we develop a pipeline based on vector-quantized variational autoencoders that compresses them into compact summary codes. These codes enable measuring mutual information and analyzing the computational structure of the underlying model behavior. Using this framework, we study planning in LMs across synthetic grammar, path-finding tasks, and natural language datasets, focusing on two planning properties: (i) the planning horizon of pre-output computations, and (ii) the extent to which the model considers alternative valid continuations. As a separate downstream use of the same pipeline, we also analyze how decision-relevant information is distributed across layers and earlier prefix blocks when producing next-token predictions. Together, these analyses advance our understanding of planning in LMs and provide a general-purpose pipeline for inspecting internal model dynamics. Our results reveal that the effective planning horizon is task-dependent, that models implicitly preserve information about unused correct continuations, and that predictions draw most on recent computations, though earlier blocks remain informative.

翻译：解码器专用语言模型（LMs）在多大程度上进行规划——即组织中间计算以支持连贯的长程生成——仍然是一个重要问题，对可解释性、可靠性和有原则的模型设计具有深远意义。规划涉及在长视野上结构化计算，并考虑多种可能的延续路径，但基于Transformer的语言模型在没有外部支架（例如思维链提示）的情况下如何展现这些能力尚不明确。我们通过分析Transformer计算核心的隐藏状态来探讨这些问题，这些状态捕获了中间结果并充当信息载体。由于这些隐藏表示具有冗余性且包含过多细粒度细节，我们开发了一个基于向量量化变分自编码器的处理流程，将其压缩为紧凑的摘要代码。这些代码使得能够测量互信息并分析底层模型行为的计算结构。利用该框架，我们在合成语法、路径寻找任务和自然语言数据集中研究语言模型的规划特性，重点关注两个规划属性：（i）输出前计算的规划视野，以及（ii）模型考虑其他有效延续路径的程度。作为同一流程的独立下游应用，我们还分析了在生成下一个词元预测时，与决策相关的信息如何分布在各个层及更早的前缀块中。这些分析共同推进了我们对语言模型规划的理解，并提供了一个用于检查内部模型动态的通用流程。我们的结果表明：有效规划视野具有任务依赖性；模型会隐式保留关于未使用的正确延续路径的信息；预测主要依赖近期计算，但更早的计算块仍保持信息价值。