Bird's-Eye View (BEV) Perception has received increasing attention in recent years as it provides a concise and unified spatial representation across views and benefits a diverse set of downstream driving applications. At the same time, data-driven simulation for autonomous driving has been a focal point of recent research but with few approaches that are both fully data-driven and controllable. Instead of using perception data from real-life scenarios, an ideal model for simulation would generate realistic street-view images that align with a given HD map and traffic layout, a task that is critical for visualizing complex traffic scenarios and developing robust perception models for autonomous driving. In this paper, we propose BEVGen, a conditional generative model that synthesizes a set of realistic and spatially consistent surrounding images that match the BEV layout of a traffic scenario. BEVGen incorporates a novel cross-view transformation with spatial attention design which learns the relationship between cameras and map views to ensure their consistency. We evaluate the proposed model on the challenging NuScenes and Argoverse 2 datasets. After training, BEVGen can accurately render road and lane lines, as well as generate traffic scenes with diverse different weather conditions and times of day.
翻译:近年来,鸟瞰感知因其能够提供跨视角的简洁统一空间表示,并有益于多种下游驾驶应用而受到越来越多关注。同时,基于数据驱动的自动驾驶仿真已成为近期研究焦点,但现有方法鲜能同时实现全数据驱动和可控性。与使用真实场景感知数据不同,理想的仿真模型应能生成与给定高精度地图和交通布局对齐的真实街景图像,这一任务对于可视化复杂交通场景以及开发鲁棒的自动驾驶感知模型至关重要。本文提出BEVGen——一种条件生成模型,可合成一组与交通场景鸟瞰图布局匹配的真实且空间一致的环境图像。BEVGen通过新颖的跨视角变换与空间注意力机制设计,学习相机与地图视角之间的关联以确保一致性。我们在具有挑战性的NuScenes和Argoverse 2数据集上评估了所提模型。训练后,BEVGen能够精确渲染道路和车道线,并生成具有多样化天气条件和不同时段特征的交通场景。