Large language models (LLMs) have demonstrated remarkable performance across a wide range of tasks. Advances in prompt engineering and fine-tuning techniques have further enhanced their ability to address complex reasoning challenges. However, these advanced capabilities are often exclusive to models exceeding 100 billion parameters. Although Chain-of-Thought (CoT) fine-tuning methods have been explored for smaller models (under 10 billion parameters), they typically depend on extensive CoT training data, which can introduce inconsistencies and limit effectiveness in low-data settings. To overcome these limitations, this paper introduce a new reasoning strategy Solution Guidance (SG) and a plug-and-play training paradigm Solution-Guidance Fine-Tuning (SGFT) for enhancing the reasoning capabilities of small language models. SG focuses on problem understanding and decomposition at the semantic and logical levels, rather than specific computations, which can effectively improve the SLMs' generalization and reasoning abilities. With only a small amount of SG training data, SGFT can fine-tune a SLM to produce accurate problem-solving guidances, which can then be flexibly fed to any SLM as prompts, enabling it to generate correct answers directly. Experimental results demonstrate that our method significantly improves the performance of SLMs on various reasoning tasks, enhancing both their practicality and efficiency within resource-constrained environments.
翻译:大型语言模型(LLM)已在广泛任务中展现出卓越性能。提示工程与微调技术的进步进一步增强了其应对复杂推理挑战的能力。然而,这些高级能力通常仅限于参数量超过千亿的模型。尽管针对小型模型(参数量低于100亿)已探索了思维链(CoT)微调方法,但这些方法通常依赖于大量CoT训练数据,可能引入不一致性并在低数据场景下限制有效性。为克服这些局限,本文提出一种新的推理策略——解决方案引导(SG),以及一种即插即用的训练范式——解决方案引导微调(SGFT),用于增强小型语言模型的推理能力。SG专注于语义和逻辑层面的问题理解与分解,而非具体计算,可有效提升SLM的泛化与推理能力。仅需少量SG训练数据,SGFT即可微调SLM以生成准确的问题解决引导,随后可灵活地将其作为提示输入至任意SLM,使其直接生成正确答案。实验结果表明,我们的方法显著提升了SLM在多种推理任务上的性能,增强了其在资源受限环境中的实用性与效率。