Deep learning architectures with powerful reasoning capabilities have driven significant advancements in autonomous driving technology. Large language models (LLMs) applied in this field can describe driving scenes and behaviors with a level of accuracy similar to human perception, particularly in visual tasks. Meanwhile, the rapid development of edge computing, with its advantage of proximity to data sources, has made edge devices increasingly important in autonomous driving. Edge devices process data locally, reducing transmission delays and bandwidth usage, and achieving faster response times. In this work, we propose a driving behavior narration and reasoning framework that applies LLMs to edge devices. The framework consists of multiple roadside units, with LLMs deployed on each unit. These roadside units collect road data and communicate via 5G NSR/NR networks. Our experiments show that LLMs deployed on edge devices can achieve satisfactory response speeds. Additionally, we propose a prompt strategy to enhance the narration and reasoning performance of the system. This strategy integrates multi-modal information, including environmental, agent, and motion data. Experiments conducted on the OpenDV-Youtube dataset demonstrate that our approach significantly improves performance across both tasks.
翻译:具备强大推理能力的深度学习架构推动了自动驾驶技术的显著进步。应用于该领域的大语言模型能够以接近人类感知的准确度描述驾驶场景和行为,尤其在视觉任务中表现突出。与此同时,边缘计算凭借其靠近数据源的优势而快速发展,使得边缘设备在自动驾驶领域日益重要。边缘设备在本地处理数据,减少了传输延迟和带宽占用,并实现了更快的响应时间。在本研究中,我们提出了一种将大语言模型应用于边缘设备的驾驶行为叙述与推理框架。该框架由多个路边单元组成,每个单元均部署了大语言模型。这些路边单元收集道路数据,并通过5G NSR/NR网络进行通信。实验表明,部署在边缘设备上的大语言模型能够达到令人满意的响应速度。此外,我们提出了一种提示策略以增强系统的叙述与推理性能。该策略整合了包括环境、智能体与运动数据在内的多模态信息。在OpenDV-Youtube数据集上进行的实验证明,我们的方法在两项任务上均显著提升了性能。