Despite the impressive performance in a variety of complex tasks, modern large language models (LLMs) still have trouble dealing with some math problems that are simple and intuitive for humans, such as addition. While we can easily learn basic rules of addition and apply them to new problems of any length, LLMs struggle to do the same. Instead, they may rely on similar "cases" seen in the training corpus for help. We define these two different reasoning mechanisms as "rule-based reasoning" and "case-based reasoning". Since rule-based reasoning is essential for acquiring the systematic generalization ability, we aim to explore exactly whether transformers use rule-based or case-based reasoning for math problems. Through carefully designed intervention experiments on five math tasks, we confirm that transformers are performing case-based reasoning, no matter whether scratchpad is used, which aligns with the previous observations that transformers use subgraph matching/shortcut learning to reason. To mitigate such problems, we propose a Rule-Following Fine-Tuning (RFFT) technique to teach transformers to perform rule-based reasoning. Specifically, we provide explicit rules in the input and then instruct transformers to recite and follow the rules step by step. Through RFFT, we successfully enable LLMs fine-tuned on 1-5 digit addition to generalize to up to 12-digit addition with over 95% accuracy, which is over 40% higher than scratchpad. The significant improvement demonstrates that teaching LLMs to explicitly use rules helps them learn rule-based reasoning and generalize better in length.
翻译:尽管现代大型语言模型(LLMs)在多种复杂任务中展现出令人印象深刻的表现,但在处理人类直观且简单的数学问题(如加法)时仍存在困难。我们可以轻松学习加法的基本规则并将其应用于任意长度的新问题,而LLMs却难以做到这一点。相反,它们可能依赖训练语料中见过的相似“案例”来辅助推理。我们将这两种不同的推理机制定义为“基于规则的推理”和“基于案例的推理”。由于基于规则的推理对于获取系统性泛化能力至关重要,我们旨在探究Transformer在数学问题中究竟采用基于规则还是基于案例的推理方式。通过在五个数学任务上进行精心设计的干预实验,我们确认Transformer执行的是基于案例的推理,无论是否使用草稿板(scratchpad),这与先前关于Transformer依赖子图匹配/捷径学习的观察结果一致。为缓解此问题,我们提出了一种规则遵循微调(RFFT)技术,用于训练Transformer执行基于规则的推理。具体而言,我们在输入中提供明确的规则,然后指导Transformer逐步背诵并遵循这些规则。通过RFFT,我们成功使微调于1-5位加法的LLMs泛化至高达12位加法,准确率超过95%,比使用草稿板的方法高出40%以上。这一显著改进表明,教会LLMs显式使用规则有助于其学习基于规则的推理,并在长度维度上实现更好的泛化。