With the rapid advancement of autonomous driving, deploying Vision-Language Models (VLMs) to enhance perception and decision-making has become increasingly common. However, the real-time application of VLMs is hindered by high latency and computational overhead, limiting their effectiveness in time-critical driving scenarios. This challenge is particularly evident when VLMs exhibit over-inference, continuing to process unnecessary layers even after confident predictions have been reached. To address this inefficiency, we propose AD-EE, an Early Exit framework that incorporates domain characteristics of autonomous driving and leverages causal inference to identify optimal exit layers. We evaluate our method on large-scale real-world autonomous driving datasets, including Waymo and the corner-case-focused CODA, as well as on a real vehicle running the Autoware Universe platform. Extensive experiments across multiple VLMs show that our method significantly reduces latency, with maximum improvements reaching up to 57.58%, and enhances object detection accuracy, with maximum gains of up to 44%.
翻译:随着自动驾驶技术的快速发展,部署视觉语言模型以增强感知与决策能力已日益普遍。然而,VLMs的高延迟与计算开销阻碍了其实时应用,限制了其在时间敏感的驾驶场景中的有效性。当VLMs表现出过度推理时,这一挑战尤为明显——即使在已获得置信预测后仍继续处理不必要的网络层。为解决这一效率问题,我们提出了AD-EE,一种结合自动驾驶领域特性并利用因果推理确定最优退出层的早期退出框架。我们在大规模真实世界自动驾驶数据集(包括Waymo和专注于极端场景的CODA)以及运行Autoware Universe平台的实际车辆上评估了我们的方法。跨多个VLMs的广泛实验表明,我们的方法显著降低了延迟(最大改进达57.58%),并提升了目标检测精度(最大增益达44%)。