As autonomous driving technology matures, end-to-end methodologies have emerged as a leading strategy, promising seamless integration from perception to control via deep learning. However, existing systems grapple with challenges such as unexpected open set environments and the complexity of black-box models. At the same time, the evolution of deep learning introduces larger, multimodal foundational models, offering multi-modal visual and textual understanding. In this paper, we harness these multimodal foundation models to enhance the robustness and adaptability of autonomous driving systems, enabling out-of-distribution, end-to-end, multimodal, and more explainable autonomy. Specifically, we present an approach to apply end-to-end open-set (any environment/scene) autonomous driving that is capable of providing driving decisions from representations queryable by image and text. To do so, we introduce a method to extract nuanced spatial (pixel/patch-aligned) features from transformers to enable the encapsulation of both spatial and semantic features. Our approach (i) demonstrates unparalleled results in diverse tests while achieving significantly greater robustness in out-of-distribution situations, and (ii) allows the incorporation of latent space simulation (via text) for improved training (data augmentation via text) and policy debugging. We encourage the reader to check our explainer video at https://www.youtube.com/watch?v=4n-DJf8vXxo&feature=youtu.be and to view the code and demos on our project webpage at https://drive-anywhere.github.io/.
翻译:随着自动驾驶技术的成熟,端到端方法已成为主流策略,旨在通过深度学习实现从感知到控制的无缝集成。然而现有系统面临开放环境不可预见性与黑箱模型复杂性等挑战。与此同时,深度学习的发展催生了更大规模的多模态基础模型,实现了多模态视觉与文本理解能力。本文利用这些多模态基础模型增强自动驾驶系统的鲁棒性与适应性,实现分布外场景下的端到端、多模态、更可解释的自主驾驶。具体而言,我们提出一种适用于任意环境/场景的端到端开放集自动驾驶方法,能够通过图像与文本可查询的表示生成驾驶决策。为此,我们引入从Transformer中提取细粒度空间(像素/块对齐)特征的技术,以封装空间与语义双重特征。本方法:(i)在多样化测试中展现卓越性能,同时在分布外场景中实现显著更强的鲁棒性;(ii)支持通过文本进行隐空间模拟,用于改进训练(文本数据增强)与策略调试。我们鼓励读者观看说明视频(https://www.youtube.com/watch?v=4n-DJf8vXxo&feature=youtu.be)并访问项目网页(https://drive-anywhere.github.io/)查看代码与演示。