To augment language models with the ability to reason, researchers usually prompt or finetune them to produce chain of thought reasoning steps before producing the final answer. However, although people use natural language to reason effectively, it may be that LMs could reason more effectively with some intermediate computation that is not in natural language. In this work, we explore an alternative reasoning approach: instead of explicitly producing the chain of thought reasoning steps, we use the language model's internal hidden states to perform implicit reasoning. The implicit reasoning steps are distilled from a teacher model trained on explicit chain-of-thought reasoning, and instead of doing reasoning "horizontally" by producing intermediate words one-by-one, we distill it such that the reasoning happens "vertically" among the hidden states in different layers. We conduct experiments on a multi-digit multiplication task and a grade school math problem dataset and find that this approach enables solving tasks previously not solvable without explicit chain-of-thought, at a speed comparable to no chain-of-thought.
翻译:为了提升语言模型的推理能力,研究人员通常会通过提示或微调使模型在生成最终答案前先产生思维链推理步骤。然而,尽管人类能有效使用自然语言进行推理,语言模型可能通过非自然语言的中间计算实现更高效的推理。本研究探索了一种替代性推理方法:不显式生成思维链推理步骤,而是利用语言模型的内部隐状态执行隐式推理。隐式推理步骤通过蒸馏在显式思维链推理数据上训练的教师模型获得,并且推理过程不是通过逐词生成进行“横向”推理,而是在不同层的隐状态之间进行“纵向”蒸馏。我们在多位数乘法任务和小学数学题数据集上进行了实验,发现该方法使得原本无法在没有显式思维链情况下解决的任务得以实现,且推理速度与无思维链的方法相当。