While current multi-frame restoration methods combine information from multiple input images using 2D alignment techniques, recent advances in novel view synthesis are paving the way for a new paradigm relying on volumetric scene representations. In this work, we introduce the first 3D-based multi-frame denoising method that significantly outperforms its 2D-based counterparts with lower computational requirements. Our method extends the multiplane image (MPI) framework for novel view synthesis by introducing a learnable encoder-renderer pair manipulating multiplane representations in feature space. The encoder fuses information across views and operates in a depth-wise manner while the renderer fuses information across depths and operates in a view-wise manner. The two modules are trained end-to-end and learn to separate depths in an unsupervised way, giving rise to Multiplane Feature (MPF) representations. Experiments on the Spaces and Real Forward-Facing datasets as well as on raw burst data validate our approach for view synthesis, multi-frame denoising, and view synthesis under noisy conditions.
翻译:当前多帧恢复方法通常利用二维对齐技术融合多输入图像信息,而新兴视图合成的最新进展正催生一种依赖体积场景表示的新范式。本文首次提出基于三维的多帧去噪方法,在计算需求更低的条件下显著超越基于二维的同类方法。该方法扩展了用于新视图合成的多平面图像框架,通过引入可学习的编码器-渲染器对在特征空间中操纵多平面表示。编码器以深度方式跨视图融合信息,而渲染器则以视图方式跨深度融合信息。两个模块经端到端训练后,以无监督方式学习分离深度,从而形成多平面特征表示。在Spaces与Real Forward-Facing数据集以及原始突发数据上的实验验证了该方法在视图合成、多帧去噪及噪声条件下的视图合成中的有效性。