Chain-of-Thought (CoT) is a critical technique in enhancing the reasoning ability of Large Language Models (LLMs), and latent reasoning methods have been proposed to accelerate the inefficient token-level reasoning chain. We notice that existing latent reasoning methods generally require model structure augmentation and exhaustive training, limiting their broader applicability. In this paper, we propose CoLT, a novel framework that implements latent reasoning as ``tool calls''. Instead of reasoning entirely in the latent space, CoLT generates seed tokens that contain information of a reasoning step. When a latent tool call is triggered, a smaller external model will take the hidden states of seed tokens as its input, and unpack the seed tokens back to a full reasoning step. In this way, we can ensure that the main model reasons in the explicit token space, preserving its ability while improving efficiency. Experimental results on four mathematical datasets demonstrate that CoLT achieves higher accuracy and shorter reasoning length than baseline latent models, and is compatible with reinforcement learning algorithms and different decoder structures.
翻译:思维链(CoT)是提升大语言模型(LLM)推理能力的关键技术,已有研究提出潜在推理方法以加速低效的令牌级推理链。我们注意到,现有潜在推理方法通常需要对模型结构进行增广并进行大量训练,这限制了其更广泛的应用。本文提出CoLT,一种将潜在推理实现为“工具调用”的新框架。CoLT并非完全在潜在空间中进行推理,而是生成包含推理步骤信息的种子令牌。当潜在工具调用被触发时,一个较小的外部模型将以种子令牌的隐藏状态作为输入,并将种子令牌解包为完整的推理步骤。通过这种方式,我们确保主模型在显式令牌空间中进行推理,在保持其能力的同时提升效率。在四个数学数据集上的实验结果表明,CoLT相比基线潜在模型实现了更高的准确率和更短的推理长度,并且与强化学习算法及不同的解码器结构兼容。