Recent research has highlighted the potential of large language models (LLMs) to improve their problem-solving capabilities with the aid of suitable external tools. In our work, we further advance this concept by introducing a closed-loop framework, referred to as LLMs A s Tool Makers (LATM), where LLMs create their own reusable tools for problem-solving. Our approach consists of two phases: 1) tool making: an LLM acts as the tool maker that crafts tools for a set of tasks. 2) tool using: another LLM acts as the tool user, which applies the tool built by the tool maker for problem-solving. On the problem-solving server side, tool-making enables continual tool generation and caching as new requests emerge. This framework enables subsequent requests to access cached tools via their corresponding APIs, enhancing the efficiency of task resolution. Recognizing that tool-making requires more sophisticated capabilities, we assign this task to a powerful, albeit resource-intensive, model. Conversely, the simpler tool-using phase is delegated to a lightweight model. This strategic division of labor allows the once-off cost of tool-making to be spread over multiple instances of tool-using, significantly reducing average costs while maintaining strong performance. Furthermore, our method offers a functional cache through the caching and reuse of tools, which stores the functionality of a class of requests instead of the natural language responses from LLMs, thus extending the applicability of the conventional cache mechanism. We evaluate our approach across various complex reasoning tasks, including Big-Bench tasks. With GPT-4 as the tool maker and GPT-3.5 as the tool user, LATM demonstrates performance equivalent to using GPT-4 for both roles, but with a significantly reduced inference cost.
翻译:近期研究凸显了大语言模型(LLMs)借助适当外部工具提升问题解决能力的潜力。在本工作中,我们通过引入一个闭环框架(称为"LLMs作为工具制造者",LATM)进一步推进了这一概念,该框架使LLMs能够自行创造可复用的工具以解决问题。我们的方法包含两个阶段:1)工具制造:LLM作为工具制造者为一系列任务定制工具;2)工具使用:另一LLM作为工具使用者,应用工具制造者构建的工具进行问题求解。在问题求解服务器端,工具制造能够随着新请求的出现持续生成并缓存工具。该框架允许后续请求通过对应API访问已缓存的工具,从而提升任务解决效率。考虑到工具制造需要更复杂的处理能力,我们将此任务分配给性能强大但资源消耗较高的模型;而相对简单的工具使用阶段则交由轻量级模型执行。这种战略性的分工使得工具制造的一次性成本能够分摊到多次工具使用中,显著降低平均成本的同时保持强劲性能。此外,我们的方法通过工具缓存与复用提供了功能性缓存——存储的是一类请求的功能而非LLMs的自然语言响应,从而扩展了传统缓存机制的适用范围。我们在包括Big-Bench任务在内的多种复杂推理任务上评估了该方法。当使用GPT-4作为工具制造者、GPT-3.5作为工具使用者时,LATM展现出与两者均使用GPT-4相当的性能,但推理成本大幅降低。