Recent large language models (LLMs) have enabled the development of advanced agentic systems that can integrate various tools and APIs to fulfill user queries through function calling. However, the deployment of these LLMs on the edge has not been explored since they typically require cloud-based infrastructure due to their substantial model size and computational demands. To this end, we present TinyAgent, an end-to-end framework for training and deploying task-specific small language model agents capable of function calling for driving agentic systems at the edge. We first show how to enable accurate function calling for open-source models via the LLMCompiler framework. We then systematically curate a high-quality dataset for function calling, which we use to fine-tune two small language models, TinyAgent-1.1B and 7B. For efficient inference, we introduce a novel tool retrieval method to reduce the input prompt length and utilize quantization to further accelerate the inference speed. As a driving application, we demonstrate a local Siri-like system for Apple's MacBook that can execute user commands through text or voice input. Our results show that our models can achieve, and even surpass, the function-calling capabilities of larger models like GPT-4-Turbo, while being fully deployed at the edge. We open-source our dataset, models, and installable package and provide a demo video for our MacBook assistant agent.
翻译:近年来,大型语言模型(LLM)的发展推动了高级代理系统的进步,这些系统能够整合多种工具和API,通过函数调用来满足用户查询需求。然而,由于LLM通常模型规模庞大、计算需求高,需要基于云的基础设施支持,因此其在边缘设备上的部署尚未得到充分探索。为此,我们提出了TinyAgent,这是一个端到端的框架,用于训练和部署具备函数调用能力的任务专用小型语言模型代理,以驱动边缘计算中的代理系统。我们首先展示了如何通过LLMCompiler框架为开源模型实现精确的函数调用功能。随后,我们系统性地构建了一个高质量的函数调用数据集,并利用该数据集对两个小型语言模型——TinyAgent-1.1B和7B——进行微调。为实现高效推理,我们引入了一种新颖的工具检索方法以减少输入提示的长度,并利用量化技术进一步加速推理速度。作为一项驱动应用,我们展示了一个在苹果MacBook上运行的本地类Siri系统,该系统能够通过文本或语音输入执行用户指令。实验结果表明,我们的模型在完全部署于边缘设备的同时,能够达到甚至超越如GPT-4-Turbo等更大模型的函数调用能力。我们开源了数据集、模型及可安装软件包,并提供了MacBook助手代理的演示视频。