Foundation Models in Autonomous Driving: A Survey on Scenario Generation and Scenario Analysis

Yuan Gao,Mattia Piccinini,Yuchen Zhang,Dingrui Wang,Korbinian Moller,Roberto Brusnicki,Baha Zarrouki,Alessio Gambi,Jan Frederik Totz,Kai Storms,Steven Peters,Andrea Stocco,Bassam Alrifaee,Marco Pavone,Johannes Betz

from arxiv, IEEE Open Journal of Intelligent Transportation Systems

For autonomous vehicles, safe navigation in complex environments depends on handling a broad range of diverse and rare driving scenarios. Simulation- and scenario-based testing have emerged as key approaches to development and validation of autonomous driving systems. Traditional scenario generation relies on rule-based systems, knowledge-driven models, and data-driven synthesis, often producing limited diversity and unrealistic safety-critical cases. With the emergence of foundation models, which represent a new generation of pre-trained, general-purpose AI models, developers can process heterogeneous inputs (e.g., natural language, sensor data, HD maps, and control actions), enabling the synthesis and interpretation of complex driving scenarios. In this paper, we conduct a survey about the application of foundation models for scenario generation and scenario analysis in autonomous driving (as of May 2025). Our survey presents a unified taxonomy that includes large language models, vision-language models, multimodal large language models, diffusion models, and world models for the generation and analysis of autonomous driving scenarios. In addition, we review the methodologies, open-source datasets, simulation platforms, and benchmark challenges, and we examine the evaluation metrics tailored explicitly to scenario generation and analysis. Finally, the survey concludes by highlighting the open challenges and research questions, and outlining promising future research directions. All reviewed papers are listed in a continuously maintained repository, which contains supplementary materials and is available at https://github.com/TUM-AVS/FM-for-Scenario-Generation-Analysis.

翻译：对于自动驾驶车辆而言，在复杂环境中安全导航取决于其处理广泛多样且罕见驾驶场景的能力。基于仿真和场景的测试已成为自动驾驶系统开发和验证的关键方法。传统的场景生成依赖于基于规则的系统、知识驱动模型和数据驱动合成方法，通常产生的多样性有限且难以生成真实的安全关键案例。随着基础模型——代表新一代预训练通用人工智能模型——的出现，开发者能够处理异构输入（例如自然语言、传感器数据、高精地图和控制动作），从而实现对复杂驾驶场景的合成与解析。本文针对基础模型在自动驾驶场景生成与场景分析中的应用（截至2025年5月）进行了系统性综述。我们提出了一个统一的分类体系，涵盖用于自动驾驶场景生成与分析的大语言模型、视觉语言模型、多模态大语言模型、扩散模型和世界模型。此外，我们回顾了相关方法论、开源数据集、仿真平台和基准挑战，并考察了专门针对场景生成与分析设计的评估指标。最后，本综述通过强调当前面临的开放挑战与研究问题，并概述有前景的未来研究方向作为总结。所有被评述的论文均列于持续维护的存储库中，该库包含补充材料并可通过 https://github.com/TUM-AVS/FM-for-Scenario-Generation-Analysis 访问。