Existing methods for the 4D reconstruction of general, non-rigidly deforming objects focus on novel-view synthesis and neglect correspondences. However, time consistency enables advanced downstream tasks like 3D editing, motion analysis, or virtual-asset creation. We propose SceNeRFlow to reconstruct a general, non-rigid scene in a time-consistent manner. Our dynamic-NeRF method takes multi-view RGB videos and background images from static cameras with known camera parameters as input. It then reconstructs the deformations of an estimated canonical model of the geometry and appearance in an online fashion. Since this canonical model is time-invariant, we obtain correspondences even for long-term, long-range motions. We employ neural scene representations to parametrize the components of our method. Like prior dynamic-NeRF methods, we use a backwards deformation model. We find non-trivial adaptations of this model necessary to handle larger motions: We decompose the deformations into a strongly regularized coarse component and a weakly regularized fine component, where the coarse component also extends the deformation field into the space surrounding the object, which enables tracking over time. We show experimentally that, unlike prior work that only handles small motion, our method enables the reconstruction of studio-scale motions.
翻译:现有针对通用非刚性变形物体的四维重建方法主要聚焦于新视角合成,而忽略了对应关系。然而,时间一致性能够支持三维编辑、运动分析或虚拟资产创建等高级下游任务。我们提出SceNeRFlow方法,以时间一致的方式重建通用非刚性场景。该动态NeRF方法以静态相机拍摄的多视角RGB视频和背景图像及已知相机参数作为输入,在线方式重建估计的几何与外观标准模型的形变。由于该标准模型具有时间不变性,即使对于长期、大范围运动也能获得对应关系。我们采用神经场景表示对方法各组件进行参数化。类似于先前的动态NeRF方法,我们使用反向形变模型。我们发现处理较大运动时需要对模型进行非平凡改进:将形变分解为强正则化的粗粒分量和弱正则化的细粒分量,其中粗粒分量还将形变场扩展至物体周围空间,从而实现跨时间追踪。实验表明,与仅能处理小运动的先前工作不同,本方法能够重建演播室级别的运动场景。