Modular end-to-end (ME2E) autonomous driving paradigms combine modular interpretability with global optimization capability and have demonstrated strong performance. However, existing studies mainly focus on accuracy improvement, while critical system-level factors such as inference latency and energy consumption are often overlooked, resulting in increasingly complex model designs that hinder practical deployment. Prior efforts on model compression and acceleration typically optimize either the software or hardware side in isolation. Software-only optimization cannot fundamentally remove intermediate tensor access and operator scheduling overheads, whereas hardware-only optimization is constrained by model structure and precision. As a result, the real-world benefits of such optimizations are often limited. To address these challenges, this paper proposes a reusable software and hardware co-optimization and closed-loop evaluation framework for ME2E autonomous driving inference. The framework jointly integrates software-level model optimization with hardware-level computation optimization under a unified system-level objective. In addition, a multidimensional evaluation metric is introduced to assess system performance by jointly considering safety, comfort, efficiency, latency, and energy, enabling quantitative comparison of different optimization strategies. Experiments across multiple ME2E autonomous driving stacks show that the proposed framework preserves baseline-level driving performance while significantly reducing inference latency and energy consumption, achieving substantial overall system-level improvements. These results demonstrate that the proposed framework provides practical and actionable guidance for efficient deployment of ME2E autonomous driving systems.
翻译:模块化端到端(ME2E)自动驾驶范式融合了模块化架构的可解释性与全局优化能力,并已展现出卓越的性能。然而,现有研究主要聚焦于精度提升,而推理延迟与能耗等关键系统级因素常被忽视,导致模型设计日趋复杂,阻碍实际部署。先前关于模型压缩与加速的研究通常仅单独优化软件或硬件侧:纯软件优化无法从根本上消除中间张量访问与算子调度开销,而纯硬件优化则受限于模型结构与精度。因此,此类优化的实际收益往往有限。为应对这些挑战,本文提出一个面向ME2E自动驾驶推理的可复用软硬件协同优化与闭环评估框架。该框架在统一的系统级目标下,将软件级模型优化与硬件级计算优化进行联合集成。此外,通过综合考虑安全性、舒适性、效率、延迟与能耗,引入多维评估指标以量化评估系统性能,从而实现对不同优化策略的定量比较。在多个ME2E自动驾驶技术栈上的实验表明,所提框架在保持基准水平驾驶性能的同时,显著降低了推理延迟与能耗,实现了整体系统性能的大幅提升。这些结果证明,该框架为ME2E自动驾驶系统的高效部署提供了切实可行的指导。