Large language models (LLMs) have recently attracted considerable interest for their ability to perform complex reasoning tasks, such as chain-of-thought reasoning. However, most of the existing approaches to enhance this ability rely heavily on data-driven methods, while neglecting the structural aspects of the model's reasoning capacity. We find that while LLMs can manage individual reasoning steps well, they struggle with maintaining consistency across an entire reasoning chain. To solve this, we introduce 'planning tokens' at the start of each reasoning step, serving as a guide for the model. These token embeddings are then fine-tuned along with the rest of the model parameters. Our approach requires a negligible increase in trainable parameters (just 0.001%) and can be applied through either full fine-tuning or a more parameter-efficient scheme. We demonstrate our method's effectiveness by applying it to three different LLMs, showing notable accuracy improvements across three math word problem datasets w.r.t. plain chain-of-thought fine-tuning baselines.
翻译:大语言模型因其执行复杂推理任务(如思维链推理)的能力近期引起了广泛关注。然而,现有增强该能力的方法大多依赖数据驱动,忽视了模型推理能力的结构层面。我们发现,虽然大语言模型能妥善处理单个推理步骤,但在维持整个推理链的一致性方面存在困难。为解决这一问题,我们在每个推理步骤的开头引入"规划标记",作为模型的引导信号。这些标记嵌入随后与模型其他参数共同微调。该方法在可训练参数上仅增加微不足道的0.001%,可通过全量微调或更具参数效率的方案实现。我们将该方法应用于三个不同的大语言模型,在三个数学文字问题数据集上相较于标准思维链微调基线均取得了显著的准确率提升,验证了其有效性。