The advent of Large Multimodal Models (LMMs) offers a promising technology to tackle the limitations of modular design in autonomous driving, which often falters in open-world scenarios requiring sustained environmental understanding and logical reasoning. Besides, embodied artificial intelligence facilitates policy optimization through closed-loop interactions to achieve the continuous learning capability, thereby advancing autonomous driving toward embodied intelligent (El) driving. However, such capability will be constrained by relying solely on LMMs to enhance EI driving without joint decision-making. This article introduces a novel semantics and policy dual-driven hybrid decision framework to tackle this challenge, ensuring continuous learning and joint decision. The framework merges LMMs for semantic understanding and cognitive representation, and deep reinforcement learning (DRL) for real-time policy optimization. We start by introducing the foundational principles of EI driving and LMMs. Moreover, we examine the emerging opportunities this framework enables, encompassing potential benefits and representative use cases. A case study is conducted experimentally to validate the performance superiority of our framework in completing lane-change planning task. Finally, several future research directions to empower EI driving are identified to guide subsequent work.
翻译:大型多模态模型(LMMs)的出现为解决自动驾驶模块化设计的局限性提供了一项前景广阔的技术,该设计在需要持续环境理解与逻辑推理的开放世界场景中常常表现不佳。此外,具身人工智能通过闭环交互促进策略优化,以实现持续学习能力,从而推动自动驾驶向具身智能(EI)驾驶发展。然而,若仅依赖LMMs来增强EI驾驶而缺乏联合决策,此类能力将受到制约。本文介绍了一种新颖的语义与策略双驱动混合决策框架以应对这一挑战,确保持续学习与联合决策。该框架融合了用于语义理解与认知表征的LMMs,以及用于实时策略优化的深度强化学习(DRL)。我们首先阐述了EI驾驶与LMMs的基本原理。进而,我们探讨了该框架所带来的新兴机遇,包括潜在优势与代表性应用案例。通过实验进行了一项案例研究,以验证我们的框架在完成换道规划任务中的性能优越性。最后,本文指出了若干赋能EI驾驶的未来研究方向,以指导后续工作。