Function calling, also known as tool use, is a core capability of modern LLM agents but is typically constrained by synchronous execution semantics. Under these semantics, LLM decoding is blocked until each function call completes, resulting in increasing end-to-end latency. In this work, we introduce AsyncFC, a pure execution-layer framework that decouples LLM decoding from function execution, enabling overlap between model decoding and function execution as well as inter-function parallelism when dependencies permit. AsyncFC layers over existing models and unmodified function implementations, requiring no fine-tuning or changes to the standard synchronous function-calling protocol. Across standard function-calling benchmarks and adapted software engineering benchmarks, AsyncFC significantly reduces end-to-end task completion time while preserving task accuracy. Furthermore, these results reveal that LLMs possess a native capability to reason over symbolic futures that represent unresolved execution results, enabling an asynchronous paradigm for model-tool interaction.
翻译:函数调用,又称工具使用,是现代大语言模型代理的核心能力,但通常受限于同步执行语义。在此语义下,LLM解码过程需等待每个函数调用完成后才能继续,导致端到端延迟不断累加。本文提出AsyncFC这一纯执行层框架,将LLM解码与函数执行解耦,使得模型解码与函数执行能够重叠,并在依赖关系允许时实现函数间并行。AsyncFC可叠加于现有模型和未经修改的函数实现之上,无需微调或改动标准同步函数调用协议。在标准函数调用基准测试和适配的软件工程基准测试中,AsyncFC在保持任务精度的同时显著缩短了端到端任务完成时间。此外,实验结果揭示LLM具备对表示未完成执行结果的符号化未来进行推理的天然能力,从而实现了模型-工具交互的异步范式。