Recent methods have demonstrated that Large Language Models (LLMs) can solve reasoning tasks better when they are encouraged to solve subtasks of the main task first. In this paper we devise a similar strategy that breaks down reasoning tasks into a problem decomposition phase and a problem solving phase and show that the strategy is able to outperform a single stage solution. Further, we hypothesize that the decomposition should be easier to distill into a smaller model compared to the problem solving because the latter requires large amounts of domain knowledge while the former only requires learning general problem solving strategies. We propose methods to distill these two capabilities and evaluate their impact on reasoning outcomes and inference cost. We find that we can distill the problem decomposition phase and at the same time achieve good generalization across tasks, datasets, and models. However, it is harder to distill the problem solving capability without losing performance and the resulting distilled model struggles with generalization. These results indicate that by using smaller, distilled problem decomposition models in combination with problem solving LLMs we can achieve reasoning with cost-efficient inference and local adaptation.
翻译:近期研究表明,当大型语言模型(LLMs)被鼓励先解决主任务的子任务时,其在推理任务上的表现更优。本文设计了一种类似策略,将推理任务分解为问题分解阶段和问题求解阶段,并证明该策略能够超越单阶段解决方案。进一步地,我们提出假设:与问题求解阶段相比,问题分解阶段应更容易被蒸馏到较小模型中,因为后者需要大量领域知识,而前者仅需学习通用问题求解策略。我们提出了蒸馏这两种能力的方法,并评估其对推理结果和计算成本的影响。研究发现,我们能够成功蒸馏问题分解阶段,同时在任务、数据集和模型间实现良好泛化。然而,问题求解能力的蒸馏难以避免性能损失,且所得蒸馏模型在泛化能力上存在不足。这些结果表明,通过将小型蒸馏后的问题分解模型与问题求解大型语言模型结合使用,我们能够实现兼顾计算效率与本地适应性的推理系统。