Novel-view synthesis techniques achieve impressive results for static scenes but struggle when faced with the inconsistencies inherent to casual capture settings: varying illumination, scene motion, and other unintended effects that are difficult to model explicitly. We present an approach for leveraging generative video models to simulate the inconsistencies in the world that can occur during capture. We use this process, along with existing multi-view datasets, to create synthetic data for training a multi-view harmonization network that is able to reconcile inconsistent observations into a consistent 3D scene. We demonstrate that our world-simulation strategy significantly outperforms traditional augmentation methods in handling real-world scene variations, thereby enabling highly accurate static 3D reconstructions in the presence of a variety of challenging inconsistencies. Project page: https://alextrevithick.github.io/simvs
翻译:新视角合成技术在静态场景中取得了令人瞩目的成果,但在面对非专业采集环境中固有的不一致性时却表现不佳:这些不一致性包括变化的照明、场景运动以及其他难以显式建模的意外效应。我们提出了一种方法,利用生成式视频模型来模拟采集过程中可能发生的世界不一致性。我们利用此过程,结合现有的多视图数据集,创建用于训练多视图协调网络的合成数据,该网络能够将不一致的观测数据协调成一个一致的三维场景。我们证明,在处理真实世界场景变化方面,我们的世界模拟策略显著优于传统的增强方法,从而能够在存在各种具有挑战性的不一致性的情况下,实现高度精确的静态三维重建。项目页面:https://alextrevithick.github.io/simvs