Detecting a diverse range of objects under various driving scenarios is essential for the effectiveness of autonomous driving systems. However, the real-world data collected often lacks the necessary diversity presenting a long-tail distribution. Although synthetic data has been utilized to overcome this issue by generating virtual scenes, it faces hurdles such as a significant domain gap and the substantial efforts required from 3D artists to create realistic environments. To overcome these challenges, we present ARSim, a fully automated, comprehensive, modular framework designed to enhance real multi-view image data with 3D synthetic objects of interest. The proposed method integrates domain adaptation and randomization strategies to address covariate shift between real and simulated data by inferring essential domain attributes from real data and employing simulation-based randomization for other attributes. We construct a simplified virtual scene using real data and strategically place 3D synthetic assets within it. Illumination is achieved by estimating light distribution from multiple images capturing the surroundings of the vehicle. Camera parameters from real data are employed to render synthetic assets in each frame. The resulting augmented multi-view consistent dataset is used to train a multi-camera perception network for autonomous vehicles. Experimental results on various AV perception tasks demonstrate the superior performance of networks trained on the augmented dataset.
翻译:摘要:在多种驾驶场景中检测多样化目标对于自动驾驶系统的有效性至关重要。然而,收集到的真实世界数据常因呈现长尾分布而缺乏必要的多样性。尽管合成数据已通过生成虚拟场景来解决这一问题,但仍面临显著领域差距以及需要3D艺术家投入大量精力构建逼真环境等障碍。为克服这些挑战,我们提出ARSim——一个全自动、综合性的模块化框架,旨在利用感兴趣的3D合成目标增强真实多视角图像数据。该方法整合了领域适应与随机化策略,通过从真实数据推断关键领域属性并对其他属性采用基于仿真的随机化,来解决真实与模拟数据之间的协变量偏移。我们利用真实数据构建简化虚拟场景,并在其中有策略地放置3D合成资产。通过估计捕获车辆周围环境的多个图像的光照分布实现照明。使用真实数据的相机参数在每一帧中渲染合成资产。生成的多视角一致增强数据集用于训练自动驾驶车辆的多相机感知网络。在多种自动驾驶感知任务上的实验结果表明,基于增强数据集训练的网络具有优越性能。