Reasoning with a chain-of-thought (CoT) enables Large Language Models (LLMs) to solve complex tasks but incurs significant inference costs due to the generation of long rationales. We propose Thinking States, a method that performs reasoning {\em while} the input is processing. Specifically, Thinking States generates sequences of thinking tokens every few input tokens, transforms the thoughts back into embedding space, and adds them to the following input tokens. This has two key advantages. First, it captures the recurrent nature of CoT, but where the thought tokens are generated as input is processing. Second, since the thoughts are represented as tokens, they can be learned from natural language supervision, and using teacher-forcing, which is parallelizable. Empirically, Thinking States outperforms other latent reasoning methods on multiple reasoning tasks, narrowing the gap to CoT on math problems, and matching its performance on 2-Hop QA with improved latency. On state-tracking tasks, we show Thinking States leads to stronger reasoning behavior than CoT, successfully extrapolating to longer sequences than seen during training.
翻译:思维链(CoT)推理使大型语言模型(LLM)能够解决复杂任务,但由于生成长推理过程,会产生显著的推理成本。我们提出思维状态(Thinking States)方法,该方法在输入处理过程中同时进行推理。具体而言,思维状态每隔几个输入标记生成一系列思维标记,将思维转换回嵌入空间,并将其添加到后续输入标记中。这具有两个关键优势。首先,它捕捉了CoT的循环特性,但思维标记是在输入处理过程中生成的。其次,由于思维以标记形式表示,它们可以从自然语言监督中学习,并使用可并行化的教师强制(teacher-forcing)方法。实验表明,思维状态在多项推理任务上优于其他潜在推理方法,在数学问题上缩小了与CoT的差距,并在2-Hop问答任务中匹配其性能,同时降低了延迟。在状态跟踪任务中,我们证明思维状态比CoT产生更强的推理行为,能够成功泛化到比训练时更长的序列。