Large language models (LLMs) have recently attracted considerable interest for their ability to perform complex reasoning tasks, such as chain-of-thought reasoning. However, most of the existing approaches to enhance this ability rely heavily on data-driven methods, while neglecting the structural aspects of the model's reasoning capacity. We find that while LLMs can manage individual reasoning steps well, they struggle with maintaining consistency across an entire reasoning chain. To solve this, we introduce 'planning tokens' at the start of each reasoning step, serving as a guide for the model. These token embeddings are then fine-tuned along with the rest of the model parameters. Our approach requires a negligible increase in trainable parameters (just 0.001%) and can be applied through either full fine-tuning or a more parameter-efficient scheme. We demonstrate our method's effectiveness by applying it to three different LLMs, showing notable accuracy improvements across three math word problem datasets w.r.t. plain chain-of-thought fine-tuning baselines.
翻译:大型语言模型(LLMs)因其执行复杂推理任务(如思维链推理)的能力而近期引起了广泛关注。然而,现有增强该能力的方法大多严重依赖数据驱动手段,忽视了模型推理能力的结构性因素。我们发现,尽管LLMs能够较好处理单个推理步骤,但在维持整个推理链的一致性上存在困难。为解决这一问题,我们在每个推理步骤起始处引入“规划令牌”(planning tokens),作为模型推理的引导。这些令牌的嵌入表示随后与模型其余参数一同进行微调。该方法仅需增加极少量可训练参数(仅占0.001%),且可通过全参数微调或更高效的参数优化方案实施。我们将该方法应用于三种不同的LLMs,在三个数学应用题数据集上相较于标准思维链微调基线取得了显著的准确率提升,验证了其有效性。