Large language models (LLMs) have recently attracted considerable interest for their ability to perform complex reasoning tasks, such as chain-of-thought reasoning. However, most of the existing approaches to enhance this ability rely heavily on data-driven methods, while neglecting the structural aspects of the model's reasoning capacity. We find that while LLMs can manage individual reasoning steps well, they struggle with maintaining consistency across an entire reasoning chain. To solve this, we introduce planning tokens at the start of each reasoning step, serving as a guide for the model, and add their embeddings to the model parameters. Our approach requires a negligible increase in trainable parameters (just 0.001%) and can be applied through either full fine-tuning or a more parameter-efficient scheme. We demonstrate our method's effectiveness by applying it to three different LLMs, showing notable accuracy improvements across three math word problem datasets w.r.t. standard fine-tuning baselines.
翻译:大型语言模型(LLMs)近期因其执行复杂推理任务(如思维链推理)的能力而备受关注。然而,现有增强此类能力的方法大多依赖数据驱动手段,忽视了模型推理能力的结构层面。我们发现,尽管LLMs能妥善处理单个推理步骤,但在维持整个推理链的一致性方面存在困难。为解决此问题,我们在每个推理步骤起始处引入规划令牌作为模型引导,并将其嵌入添加到模型参数中。本方法仅需微不足道的可训练参数增量(仅0.001%),可通过全微调或更高效的参数优化方案实施。通过在三个不同LLMs上应用该方法,我们验证了其在三个数学应用题数据集上相较于标准微调基准的显著准确率提升效果。