Small Language Models Fine-tuned to Coordinate Larger Language Models improve Complex Reasoning

Large Language Models (LLMs) prompted to generate chain-of-thought (CoT) exhibit impressive reasoning capabilities. Recent attempts at prompt decomposition toward solving complex, multi-step reasoning problems depend on the ability of the LLM to simultaneously decompose and solve the problem. A significant disadvantage is that foundational LLMs are typically not available for fine-tuning, making adaptation computationally prohibitive. We believe (and demonstrate) that problem decomposition and solution generation are distinct capabilites, better addressed in separate modules, than by one monolithic LLM. We introduce DaSLaM, which uses a decomposition generator to decompose complex problems into subproblems that require fewer reasoning steps. These subproblems are answered by a solver. We use a relatively small (13B parameters) LM as the decomposition generator, which we train using policy gradient optimization to interact with a solver LM (regarded as black-box) and guide it through subproblems, thereby rendering our method solver-agnostic. Evaluation on multiple different reasoning datasets reveal that with our method, a 175 billion parameter LM (text-davinci-003) can produce competitive or even better performance, compared to its orders-of-magnitude larger successor, GPT-4. Additionally, we show that DaSLaM is not limited by the solver's capabilities as a function of scale; e.g., solver LMs with diverse sizes give significant performance improvement with our solver-agnostic decomposition technique. Exhaustive ablation studies evince the superiority of our modular finetuning technique over exorbitantly large decomposer LLMs, based on prompting alone.

翻译：大语言模型（LLM）在生成思维链（CoT）提示下展现出令人瞩目的推理能力。近期针对复杂多步推理问题的提示分解尝试，依赖于LLM同时进行问题分解与求解的能力。其显著缺陷在于基础LLM通常无法进行微调，使得适应性调整在计算上难以承受。我们认为（并证明）问题分解与解生成是两种不同的能力，更适合通过独立模块而非单一整体LLM来处理。我们提出DaSLaM方法，利用分解生成器将复杂问题分解为所需推理步骤更少的子问题，再由求解器回答这些子问题。我们使用参数量相对较小（13B）的语言模型作为分解生成器，通过策略梯度优化训练其与求解器LLM（视为黑盒）交互，引导其处理子问题，从而使我们的方法具有求解器无关性。在多个不同推理数据集上的评估表明，使用我们的方法，1750亿参数的语言模型（text-davinci-003）能够产生与规模大数个量级的后继模型GPT-4相竞争甚至更优的性能。此外，我们证明DaSLaM不会因求解器规模能力而受限：不同规模的求解器LLM通过我们的求解器无关分解技术均可获得显著的性能提升。详尽的消融研究证明，与仅依赖提示的巨量分解器LLM相比，我们的模块化微调技术具有显著优越性。