Neural rendering-based urban scene reconstruction methods commonly rely on images collected from driving vehicles with cameras facing and moving forward. Although these methods can successfully synthesize from views similar to training camera trajectory, directing the novel view outside the training camera distribution does not guarantee on-par performance. In this paper, we tackle the Extrapolated View Synthesis (EVS) problem by evaluating the reconstructions on views such as looking left, right or downwards with respect to training camera distributions. To improve rendering quality for EVS, we initialize our model by constructing dense LiDAR map, and propose to leverage prior scene knowledge such as surface normal estimator and large-scale diffusion model. Qualitative and quantitative comparisons demonstrate the effectiveness of our methods on EVS. To the best of our knowledge, we are the first to address the EVS problem in urban scene reconstruction. Link to our project page: https://vegs3d.github.io/.
翻译:基于神经渲染的城市场景重建方法通常依赖于从行驶车辆上采集的图像,这些车辆的摄像头面向前方并向前移动。尽管这些方法能够成功合成与训练相机轨迹相似的视图,但将新视图引导至训练相机分布之外并不能保证同等性能。本文通过评估在训练相机分布基础上向左、向右或向下观察等视图的重建效果,解决了外推视图合成问题。为提升EVS的渲染质量,我们通过构建密集激光雷达地图初始化模型,并提出了利用表面法线估计器和大规模扩散模型等先验场景知识的方法。定性与定量比较证明了我们方法在EVS上的有效性。据我们所知,我们是首个解决城市场景重建中EVS问题的工作。项目页面链接:https://vegs3d.github.io/。