Vision-Language Models (VLMs) are increasingly applied in autonomous driving for unified perception and reasoning, but high inference latency hinders real-time deployment. Early-exit reduces latency by terminating inference at intermediate layers, yet its task-dependent nature limits generalization across diverse scenarios. We observe that this limitation aligns with autonomous driving: navigation systems can anticipate upcoming contexts (e.g., intersections, traffic lights), indicating which tasks will be required. We propose Nav-EE, a navigation-guided early-exit framework that precomputes task-specific exit layers offline and dynamically applies them online based on navigation priors. Experiments on CODA, Waymo, and BOSCH show that Nav-EE achieves accuracy comparable to full inference while reducing latency by up to 63.9%. Real-vehicle integration with Autoware Universe further demonstrates reduced inference latency (600ms to 300ms), supporting faster decision-making in complex scenarios. These results suggest that coupling navigation foresight with early-exit offers a viable path toward efficient deployment of large models in autonomous systems. Code and data are available at our anonymous repository: https://anonymous.4open.science/r/Nav-EE-BBC4
翻译:视觉-语言模型(VLMs)在自动驾驶中日益应用于统一的感知与推理任务,但其较高的推理延迟阻碍了实时部署。早期退出机制通过在中途层终止推理来降低延迟,但其任务依赖性限制了其在多样化场景中的泛化能力。我们观察到这一局限性与自动驾驶的特性相契合:导航系统能够预判即将到来的上下文(如交叉路口、交通信号灯),从而指示哪些任务将被需要。我们提出Nav-EE,一种导航引导的早期退出框架,该框架离线预计算任务特定的退出层,并在线根据导航先验动态应用这些层。在CODA、Waymo和BOSCH数据集上的实验表明,Nav-EE在实现与完整推理相当的准确率的同时,将延迟降低了最高63.9%。与Autoware Universe集成的实车测试进一步证明了推理延迟的降低(从600ms降至300ms),支持在复杂场景中实现更快的决策。这些结果表明,将导航预见性与早期退出机制相结合,为在自动驾驶系统中高效部署大模型提供了一条可行的路径。代码与数据可在我们的匿名仓库获取:https://anonymous.4open.science/r/Nav-EE-BBC4