Despite the strong performance of large language models (LLMs) in tasks like mathematical reasoning, their practical use is limited by high computational demands and proprietary restrictions. Chain-of-thought (CoT) and program-of-thought (PoT) fine-tuning are common methods to transfer LLM knowledge to small language models (SLMs). However, CoT often leads to calculation errors in SLMs, while PoT has shown more promise. While most PoT-based approaches focus on direct problem-to-code conversion or extracting only the key information from questions and then providing code solution for it, this work emphasizes filling the gaps in the question to clearly illustrate the solution path, which can be challenging for an SLM to understand when such information is not explicitly provided. Therefore, this paper introduces Gap-Filling Prompting (GFP), a novel two-step prompting strategy designed to enhance the problem-solving process for SLMs. The first step identifies these gaps and provides hints for filling them, while the second step adds the hints to the question to generate a final code solution. Experimental results on two benchmark datasets demonstrate that GFP significantly improves the mathematical reasoning abilities of SLMs.
翻译:尽管大型语言模型(LLM)在数学推理等任务中表现出色,但其实际应用受到高计算需求和专有许可限制的制约。思维链(CoT)和程序思维(PoT)微调是将LLM知识迁移到小型语言模型(SLM)的常用方法。然而,CoT常导致SLM出现计算错误,而PoT则展现出更大潜力。当前多数基于PoT的方法侧重于问题到代码的直接转换,或仅从问题中提取关键信息后提供代码解决方案;而本研究强调通过填充问题中的信息间隙来清晰展示解题路径,这对于SLM在信息未明确提供时的理解尤为重要。为此,本文提出间隙填充提示(GFP)——一种新颖的两步提示策略,旨在增强SLM的问题解决能力。第一步识别信息间隙并提供填充提示,第二步将提示信息融入问题中以生成最终代码解决方案。在两个基准数据集上的实验结果表明,GFP显著提升了SLM的数学推理能力。