场景梦想家：用于生成驾驶仿真环境的矢量化潜在扩散模型 (Scenario Dreamer: Vectorized Latent Diffusion for Generating Driving Simulation Environments)

We introduce Scenario Dreamer, a fully data-driven generative simulator for autonomous vehicle planning that generates both the initial traffic scene - comprising a lane graph and agent bounding boxes - and closed-loop agent behaviours. Existing methods for generating driving simulation environments encode the initial traffic scene as a rasterized image and, as such, require parameter-heavy networks that perform unnecessary computation due to many empty pixels in the rasterized scene. Moreover, we find that existing methods that employ rule-based agent behaviours lack diversity and realism. Scenario Dreamer instead employs a novel vectorized latent diffusion model for initial scene generation that directly operates on the vectorized scene elements and an autoregressive Transformer for data-driven agent behaviour simulation. Scenario Dreamer additionally supports scene extrapolation via diffusion inpainting, enabling the generation of unbounded simulation environments. Extensive experiments show that Scenario Dreamer outperforms existing generative simulators in realism and efficiency: the vectorized scene-generation base model achieves superior generation quality with around 2x fewer parameters, 6x lower generation latency, and 10x fewer GPU training hours compared to the strongest baseline. We confirm its practical utility by showing that reinforcement learning planning agents are more challenged in Scenario Dreamer environments than traditional non-generative simulation environments, especially on long and adversarial driving environments.

翻译：我们提出了场景梦想家，一种完全数据驱动的自动驾驶规划生成仿真器，能够同时生成初始交通场景——包括车道图和智能体边界框——以及闭环智能体行为。现有的驾驶仿真环境生成方法将初始交通场景编码为栅格化图像，因此需要参数密集的网络，这些网络由于栅格化场景中存在大量空像素而执行不必要的计算。此外，我们发现采用基于规则的智能体行为的现有方法缺乏多样性和真实性。场景梦想家则采用了一种新颖的矢量化潜在扩散模型进行初始场景生成，该模型直接在矢量化场景元素上操作，并使用自回归Transformer进行数据驱动的智能体行为仿真。场景梦想家还通过扩散修复支持场景外推，从而能够生成无界的仿真环境。大量实验表明，场景梦想家在真实性和效率上优于现有的生成仿真器：与最强基线相比，矢量化场景生成基础模型以约2倍的更少参数、6倍的更低生成延迟和10倍的更少GPU训练时间实现了更优的生成质量。我们通过展示强化学习规划智能体在场景梦想家环境中比在传统非生成仿真环境中面临更多挑战，尤其是在长距离和对抗性驾驶环境中，证实了其实用价值。