Leveraging the general world knowledge of Large Language Models (LLMs) holds significant promise for improving the ability of autonomous driving systems to handle rare and complex scenarios. While integrating LLMs into Vision-Language-Action (VLA) models has yielded state-of-the-art performance, their massive parameter counts pose severe challenges for latency-sensitive and energy-efficient deployment. Distilling LLM knowledge into a compact driving model offers a compelling solution to retain these reasoning capabilities while maintaining a manageable computational footprint. Although previous works have demonstrated the efficacy of distillation, these efforts have primarily focused on relatively simple scenarios and open-loop evaluations. Therefore, in this work, we investigate LLM distillation in more complex, interactive scenarios under closed-loop evaluation. We demonstrate that through a combination of latent feature distillation and ground-truth trajectory supervision, an efficient vision-only student model \textbf{Orion-Lite} can even surpass the performance of its massive VLA teacher, ORION. Setting a new state-of-the-art on the rigorous Bench2Drive benchmark, with a Driving Score of 80.6. Ultimately, this reveals that vision-only architectures still possess significant, untapped potential for high-performance reactive planning.
翻译:利用大语言模型(LLM)的通用世界知识,有望显著提升自动驾驶系统处理罕见复杂场景的能力。尽管将LLM集成至视觉-语言-动作(VLA)模型已取得最先进性能,但其庞大的参数量对延迟敏感且需能效优化的部署场景构成严峻挑战。将LLM知识蒸馏至紧凑型驾驶模型,可在保持推理能力的同时控制计算负荷,是极具前景的解决方案。现有工作虽已验证蒸馏有效性,但主要聚焦于简单场景与开环评估。为此,本研究探索了闭环评估下更复杂交互场景中的LLM蒸馏技术。我们证明,通过结合潜在特征蒸馏与轨迹真值监督,高效的纯视觉学生模型\textbf{Orion-Lite}甚至能超越其庞大师模型——VLA架构ORION的性能。该模型在严格的Bench2Drive基准测试中以80.6的驾驶得分创下新纪录,最终揭示纯视觉架构在高性能反应式规划中仍具有显著未被发掘的潜力。