In this work, we aim to reconstruct a time-varying 3D model, capable of rendering photo-realistic renderings with independent control of viewpoint, illumination, and time, from Internet photos of large-scale landmarks. The core challenges are twofold. First, different types of temporal changes, such as illumination and changes to the underlying scene itself (such as replacing one graffiti artwork with another) are entangled together in the imagery. Second, scene-level temporal changes are often discrete and sporadic over time, rather than continuous. To tackle these problems, we propose a new scene representation equipped with a novel temporal step function encoding method that can model discrete scene-level content changes as piece-wise constant functions over time. Specifically, we represent the scene as a space-time radiance field with a per-image illumination embedding, where temporally-varying scene changes are encoded using a set of learned step functions. To facilitate our task of chronology reconstruction from Internet imagery, we also collect a new dataset of four scenes that exhibit various changes over time. We demonstrate that our method exhibits state-of-the-art view synthesis results on this dataset, while achieving independent control of viewpoint, time, and illumination.
翻译:本文旨在利用互联网上大规模地标照片,重建随时间变化的三维模型,并能够独立控制视点、光照和时间,生成逼真的渲染结果。核心挑战有两方面:首先,不同类型的时间变化(如光照变化以及场景本身的改变,例如用一幅新的涂鸦替换另一幅)在图像中相互纠缠;其次,场景级别的时间变化往往是离散且偶发的,而非连续。为解决这些问题,我们提出了一种新的场景表示方法,并配备了一种新颖的时间阶跃函数编码技术,该技术可将离散的场景内容变化建模为随时间变化的分段常数函数。具体来说,我们将场景表示为一个包含每幅图像光照嵌入的时空辐射场,其中随时间变化的场景变动通过一组学习到的阶跃函数进行编码。为了便于从互联网图像中重建时间顺序,我们还收集了一个包含四个场景的新数据集,这些场景随时间呈现多种变化。实验证明,我们的方法在该数据集上实现了最先进的视图合成效果,同时实现了对视点、时间和光照的独立控制。