Prospective Role of Foundation Models in Advancing Autonomous Vehicles

With the development of artificial intelligence and breakthroughs in deep learning, large-scale Foundation Models (FMs), such as GPT, Sora, etc., have achieved remarkable results in many fields including natural language processing and computer vision. The application of FMs in autonomous driving holds considerable promise. For example, they can contribute to enhancing scene understanding and reasoning. By pre-training on rich linguistic and visual data, FMs can understand and interpret various elements in a driving scene, and provide cognitive reasoning to give linguistic and action instructions for driving decisions and planning. Furthermore, FMs can augment data based on the understanding of driving scenarios to provide feasible scenes of those rare occurrences in the long tail distribution that are unlikely to be encountered during routine driving and data collection. The enhancement can subsequently lead to improvement in the accuracy and reliability of autonomous driving systems. Another testament to the potential of FMs' applications lies in World Models, exemplified by the DREAMER series, which showcases the ability to comprehend physical laws and dynamics. Learning from massive data under the paradigm of self-supervised learning, World Model can generate unseen yet plausible driving environments, facilitating the enhancement in the prediction of road users' behaviors and the off-line training of driving strategies. In this paper, we synthesize the applications and future trends of FMs in autonomous driving. By utilizing the powerful capabilities of FMs, we strive to tackle the potential issues stemming from the long-tail distribution in autonomous driving, consequently advancing overall safety in this domain.

翻译：随着人工智能的发展与深度学习的突破，大规模基础模型（FMs），如GPT、Sora等，已在自然语言处理和计算机视觉等众多领域取得了显著成果。基础模型在自动驾驶中的应用前景广阔。例如，它们有助于增强场景理解与推理能力。通过在丰富的语言和视觉数据上进行预训练，基础模型能够理解并解读驾驶场景中的各类元素，并提供认知推理，为驾驶决策与规划生成语言及动作指令。此外，基础模型可基于对驾驶场景的理解增强数据，为那些在日常驾驶和数据采集中极少遇到的、呈长尾分布的罕见情况提供可行场景。这种增强进而有助于提升自动驾驶系统的准确性与可靠性。另一个体现基础模型应用潜力的在于世界模型，以DREAMER系列为代表，展示了理解物理规律与动态的能力。在自监督学习范式下，世界模型从海量数据中学习，能够生成未曾见过但合理的驾驶环境，从而促进对道路使用者行为的预测以及驾驶策略的离线训练。本文综合论述了基础模型在自动驾驶中的应用与未来趋势。通过利用基础模型的强大能力，我们致力于解决自动驾驶中由长尾分布引发的潜在问题，进而提升该领域的整体安全性。