面向车载功能调用的轻量级语言模型优化 (Optimizing Small Language Models for In-Vehicle Function-Calling)

Yahya Sowti Khiabani,Farris Atif,Chieh Hsu,Sven Stahlmann,Tobias Michels,Sebastian Kramer,Benedikt Heidrich,M. Saquib Sarfraz,Julian Merten,Faezeh Tafazzoli

We propose a holistic approach for deploying Small Language Models (SLMs) as function-calling agents within vehicles as edge devices, offering a more flexible and robust alternative to traditional rule-based systems. By leveraging SLMs, we simplify vehicle control mechanisms and enhance the user experience. Given the in-vehicle hardware constraints, we apply state-of-the-art model compression techniques, including structured pruning, healing, and quantization, ensuring that the model fits within the resource limitations while maintaining acceptable performance. Our work focuses on optimizing a representative SLM, Microsoft's Phi-3 mini, and outlines best practices for enabling embedded models, including compression, task-specific fine-tuning, and vehicle integration. We demonstrate that, despite significant reduction in model size which removes up to 2 billion parameters from the original model, our approach preserves the model's ability to handle complex in-vehicle tasks accurately and efficiently. Furthermore, by executing the model in a lightweight runtime environment, we achieve a generation speed of 11 tokens per second, making real-time, on-device inference feasible without hardware acceleration. Our results demonstrate the potential of SLMs to transform vehicle control systems, enabling more intuitive interactions between users and their vehicles for an enhanced driving experience.

翻译：我们提出了一种整体性方法，将轻量级语言模型部署为车载边缘设备中的功能调用智能体，为传统基于规则的系统提供了一种更灵活、更鲁棒的替代方案。通过利用轻量级语言模型，我们简化了车辆控制机制并提升了用户体验。鉴于车载硬件的资源限制，我们应用了最先进的模型压缩技术，包括结构化剪枝、修复与量化，确保模型在满足资源约束的同时保持可接受的性能。我们的工作聚焦于优化一个代表性轻量级语言模型——微软的Phi-3 mini，并阐述了实现嵌入式模型的最佳实践，包括压缩、任务特定微调以及车辆集成。实验表明，尽管模型规模显著缩减（从原始模型中移除了高达20亿参数），我们的方法仍能保持模型准确高效处理复杂车载任务的能力。此外，通过在轻量级运行时环境中执行模型，我们实现了每秒11个词元的生成速度，使得无需硬件加速的实时设备端推理成为可能。我们的研究结果证明了轻量级语言模型在变革车辆控制系统方面的潜力，能够实现用户与车辆之间更直观的交互，从而提升驾驶体验。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日