We propose a holistic approach for deploying Small Language Models (SLMs) as function-calling agents within vehicles as edge devices, offering a more flexible and robust alternative to traditional rule-based systems. By leveraging SLMs, we simplify vehicle control mechanisms and enhance the user experience. Given the in-vehicle hardware constraints, we apply state-of-the-art model compression techniques, including structured pruning, healing, and quantization, ensuring that the model fits within the resource limitations while maintaining acceptable performance. Our work focuses on optimizing a representative SLM, Microsoft's Phi-3 mini, and outlines best practices for enabling embedded models, including compression, task-specific fine-tuning, and vehicle integration. We demonstrate that, despite significant reduction in model size which removes up to 2 billion parameters from the original model, our approach preserves the model's ability to handle complex in-vehicle tasks accurately and efficiently. Furthermore, by executing the model in a lightweight runtime environment, we achieve a generation speed of 11 tokens per second, making real-time, on-device inference feasible without hardware acceleration. Our results demonstrate the potential of SLMs to transform vehicle control systems, enabling more intuitive interactions between users and their vehicles for an enhanced driving experience.
翻译:我们提出了一种整体性方法,将轻量级语言模型部署为车载边缘设备中的功能调用智能体,为传统基于规则的系统提供了一种更灵活、更鲁棒的替代方案。通过利用轻量级语言模型,我们简化了车辆控制机制并提升了用户体验。鉴于车载硬件的资源限制,我们应用了最先进的模型压缩技术,包括结构化剪枝、修复与量化,确保模型在满足资源约束的同时保持可接受的性能。我们的工作聚焦于优化一个代表性轻量级语言模型——微软的Phi-3 mini,并阐述了实现嵌入式模型的最佳实践,包括压缩、任务特定微调以及车辆集成。实验表明,尽管模型规模显著缩减(从原始模型中移除了高达20亿参数),我们的方法仍能保持模型准确高效处理复杂车载任务的能力。此外,通过在轻量级运行时环境中执行模型,我们实现了每秒11个词元的生成速度,使得无需硬件加速的实时设备端推理成为可能。我们的研究结果证明了轻量级语言模型在变革车辆控制系统方面的潜力,能够实现用户与车辆之间更直观的交互,从而提升驾驶体验。