Photorealistic simulators are essential for the training and evaluation of vision-centric autonomous vehicles (AVs). At their core is Novel View Synthesis (NVS), a crucial capability that generates diverse unseen viewpoints to accommodate the broad and continuous pose distribution of AVs. Recent advances in radiance fields, such as 3D Gaussian Splatting, achieve photorealistic rendering at real-time speeds and have been widely used in modeling large-scale driving scenes. However, their performance is commonly evaluated using an interpolated setup with highly correlated training and test views. In contrast, extrapolation, where test views largely deviate from training views, remains underexplored, limiting progress in generalizable simulation technology. To address this gap, we leverage publicly available AV datasets with multiple traversals, multiple vehicles, and multiple cameras to build the first Extrapolated Urban View Synthesis (EUVS) benchmark. Meanwhile, we conduct quantitative and qualitative evaluations of state-of-the-art Gaussian Splatting methods across different difficulty levels. Our results show that Gaussian Splatting is prone to overfitting to training views. Besides, incorporating diffusion priors and improving geometry cannot fundamentally improve NVS under large view changes, highlighting the need for more robust approaches and large-scale training. We have released our data to help advance self-driving and urban robotics simulation technology.
翻译:逼真的模拟器对于以视觉为中心的自动驾驶车辆(AV)的训练与评估至关重要。其核心在于新颖视图合成(NVS),这是一种生成多样化未见视角的关键能力,以适应自动驾驶车辆广泛且连续的姿态分布。辐射场领域的最新进展,例如3D高斯泼溅,能够以实时速度实现逼真渲染,并已广泛应用于大规模驾驶场景建模。然而,其性能通常在使用高度相关的训练与测试视图的插值设置下进行评估。相比之下,测试视图显著偏离训练视图的外推场景仍未得到充分探索,这限制了通用化模拟技术的进步。为填补这一空白,我们利用具有多次遍历、多辆车及多个摄像头的公开自动驾驶数据集,构建了首个外推式城市场景视图合成(EUVS)基准。同时,我们对最先进的高斯泼溅方法在不同难度级别上进行了定量与定性评估。我们的结果表明,高斯泼溅方法容易对训练视图产生过拟合。此外,融入扩散先验和改进几何结构并不能从根本上改善大视角变化下的NVS性能,这凸显了对更鲁棒方法及大规模训练的需求。我们已公开相关数据,以助力推动自动驾驶与城市机器人模拟技术的发展。