An LLM Compiler for Parallel Function Calling

Large Language Models (LLMs) have shown remarkable results on various complex reasoning benchmarks. The reasoning capabilities of LLMs enable them to execute function calls, using user-provided functions to overcome their inherent limitations, such as knowledge cutoffs, poor arithmetic skills, or lack of access to private data. This development has expanded LLMs' scope to include multi-function calling, where LLMs are equipped with a variety of functions and select the proper functions based on the context. Multi-function calling abilities of LLMs have catalyzed LLM-based software development, allowing them to tackle more complex problems. However, current methods for multi-function calling often require sequential reasoning and acting for each function which can result in high latency, cost, and sometimes inaccurate behavior. To address this, we introduce LLMCompiler, which executes functions in parallel to efficiently orchestrate multi-function calling. Drawing from the principles of classical compilers, LLMCompiler streamlines parallel function calling with three components: (i) an LLM Planner, formulating execution strategies and dependencies; (ii) a Task Fetching Unit, dispatching function calling tasks; and (iii) an Executor, executing these tasks in parallel. LLMCompiler automatically computes an optimized orchestration for the function calls and can be used with open-source models such as LLaMA-2. We have benchmarked LLMCompiler on a range of tasks including cases with non-trivial inter-dependency between function calls, as well as cases that require dynamic replanning based on intermediate results. We observe consistent latency speedup of up to 3.7x, cost savings of up to 6.7x, and accuracy improvement of up to ~9% as compared to ReAct. Additionally, LLMCompiler achieves up to 1.35x latency gain over OpenAI's recent parallel function calling, while achieving similar accuracy.

翻译：大型语言模型（LLM）已在多种复杂推理基准测试中展现出卓越成果。其推理能力使其能够通过调用用户提供的函数来执行操作，从而克服知识截止、算术能力薄弱或无法访问私有数据等固有局限。这一发展将LLM的应用范围扩展至多函数调用场景——模型配备多种函数并根据上下文选择合适函数。LLM的多函数调用能力推动了基于LLM的软件开发，使其能够处理更复杂的问题。然而，当前多函数调用方法通常需要对每个函数进行顺序推理与执行，导致高延迟、高成本及偶发的不准确行为。为此，我们提出LLMCompiler，通过并行执行函数高效编排多函数调用。借鉴传统编译器原理，LLMCompiler包含三个组件以精简并行函数调用：（i）LLM规划器——制定执行策略与依赖关系；（ii）任务获取单元——分派函数调用任务；（iii）执行器——并行执行这些任务。LLMCompiler可自动计算函数调用的最优编排方案，并兼容LLaMA-2等开源模型。我们在涵盖函数调用间复杂依赖关系的任务及需基于中间结果动态重规划的场景中对其进行基准测试。实验表明，相较于ReAct，LLMCompiler实现最高3.7倍延迟加速、6.7倍成本节约及约9%的精度提升。此外，与OpenAI最新并行函数调用方案相比，LLMCompiler在保持相近精度的同时，额外获得最高1.35倍的延迟增益。