Foundation Models in Autonomous Driving: A Survey on Scenario Generation and Scenario Analysis

Yuan Gao,Mattia Piccinini,Yuchen Zhang,Dingrui Wang,Korbinian Moller,Roberto Brusnicki,Baha Zarrouki,Alessio Gambi,Jan Frederik Totz,Kai Storms,Steven Peters,Andrea Stocco,Bassam Alrifaee,Marco Pavone,Johannes Betz

from arxiv, Final version (Accepted by the IEEE Open Journal of Intelligent Transportation Systems)

For autonomous vehicles, safe navigation in complex environments depends on handling a broad range of diverse and rare driving scenarios. Simulation- and scenario-based testing have emerged as key approaches to development and validation of autonomous driving systems. Traditional scenario generation relies on rule-based systems, knowledge-driven models, and data-driven synthesis, often producing limited diversity and unrealistic safety-critical cases. With the emergence of foundation models, which represent a new generation of pre-trained, general-purpose AI models, developers can process heterogeneous inputs (e.g., natural language, sensor data, HD maps, and control actions), enabling the synthesis and interpretation of complex driving scenarios. In this paper, we conduct a survey about the application of foundation models for scenario generation and scenario analysis in autonomous driving (as of May 2025). Our survey presents a unified taxonomy that includes large language models, vision-language models, multimodal large language models, diffusion models, and world models for the generation and analysis of autonomous driving scenarios. In addition, we review the methodologies, open-source datasets, simulation platforms, and benchmark challenges, and we examine the evaluation metrics tailored explicitly to scenario generation and analysis. Finally, the survey concludes by highlighting the open challenges and research questions, and outlining promising future research directions. All reviewed papers are listed in a continuously maintained repository, which contains supplementary materials and is available at https://github.com/TUM-AVS/FM-for-Scenario-Generation-Analysis.

翻译：对于自动驾驶车辆而言，在复杂环境中安全导航取决于其处理广泛多样且罕见驾驶场景的能力。基于仿真和场景的测试已成为自动驾驶系统开发和验证的关键方法。传统的场景生成依赖于基于规则的系统、知识驱动模型和数据驱动合成方法，通常产生的多样性有限且难以生成真实的安全关键案例。随着基础模型——代表新一代预训练通用人工智能模型——的出现，开发者能够处理异构输入（例如自然语言、传感器数据、高清地图和控制动作），从而实现复杂驾驶场景的合成与解析。本文综述了截至2025年5月，基础模型在自动驾驶场景生成与场景分析中的应用。本综述提出了一个统一的分类体系，涵盖用于自动驾驶场景生成与分析的大型语言模型、视觉语言模型、多模态大语言模型、扩散模型和世界模型。此外，我们回顾了相关方法论、开源数据集、仿真平台和基准挑战，并审视了专门针对场景生成与分析定制的评估指标。最后，本综述通过强调当前面临的开放挑战与研究问题，并概述了未来有前景的研究方向，作为总结。所有被评述的论文均列于一个持续维护的存储库中，该库包含补充材料，可通过 https://github.com/TUM-AVS/FM-for-Scenario-Generation-Analysis 访问。