Chain-of-Thought (CoT) empowers Large Language Models (LLMs) to tackle complex problems, but remains constrained by the computational cost and reasoning path collapse when grounded in discrete token spaces. Recent latent reasoning approaches attempt to optimize efficiency by performing reasoning within continuous hidden states. However, these methods typically operate as opaque end-to-end mappings from explicit reasoning steps to latent states, and often require a pre-defined number of latent steps during inference. In this work, we introduce PLaT (Planning with Latent Thoughts), a framework that reformulates latent reasoning as planning by fundamentally decouple reasoning from verbalization. We model reasoning as a deterministic trajectory of latent planning states, while a separate Decoder grounds these thoughts into text when necessary. This decoupling allows the model to dynamically determine when to terminate reasoning rather than relying on fixed hyperparameters. Empirical results on mathematical benchmarks reveal a distinct trade-off: while PLaT achieves lower greedy accuracy than baselines, it demonstrates superior scalability in terms of reasoning diversity. This indicates that PLaT learns a robust, broader solution space, offering a transparent and scalable foundation for inference-time search. Our code can be found in https://github.com/yunsaijc/PLaT.
翻译:思维链(CoT)使大型语言模型(LLM)能够处理复杂问题,但其仍受限于计算成本以及在离散词元空间中推理路径崩溃的问题。近期的潜在推理方法尝试通过在连续隐藏状态中进行推理来优化效率。然而,这些方法通常作为从显式推理步骤到潜在状态的不透明端到端映射运行,并且在推理过程中往往需要预定义数量的潜在步骤。在本工作中,我们提出了PLaT(Planning with Latent Thoughts),这是一个通过从根本上将推理与言语化解耦,从而将潜在推理重新表述为规划的框架。我们将推理建模为潜在规划状态的确定性轨迹,而一个独立的解码器在必要时将这些思想落地为文本。这种解耦使得模型能够动态决定何时终止推理,而非依赖于固定的超参数。在数学基准测试上的实证结果揭示了一个明显的权衡:虽然PLaT的贪婪准确率低于基线模型,但它在推理多样性方面表现出更优的可扩展性。这表明PLaT学习到了一个鲁棒且更广泛的解空间,为推理时搜索提供了一个透明且可扩展的基础。我们的代码可在 https://github.com/yunsaijc/PLaT 找到。