The field of 3D reconstruction from images has rapidly evolved in the past few years, first with the introduction of Neural Radiance Field (NeRF) and more recently with 3D Gaussian Splatting (3DGS). The latter provides a significant edge over NeRF in terms of the training and inference speed, as well as the reconstruction quality. Although 3DGS works well for dense input images, the unstructured point-cloud like representation quickly overfits to the more challenging setup of extremely sparse input images (e.g., 3 images), creating a representation that appears as a jumble of needles from novel views. To address this issue, we propose regularized optimization and depth-based initialization. Our key idea is to introduce a structured Gaussian representation that can be controlled in 2D image space. We then constraint the Gaussians, in particular their position, and prevent them from moving independently during optimization. Specifically, we introduce single and multiview constraints through an implicit convolutional decoder and a total variation loss, respectively. With the coherency introduced to the Gaussians, we further constrain the optimization through a flow-based loss function. To support our regularized optimization, we propose an approach to initialize the Gaussians using monocular depth estimates at each input view. We demonstrate significant improvements compared to the state-of-the-art sparse-view NeRF-based approaches on a variety of scenes.
翻译:从图像进行三维重建的领域在过去几年中迅速发展,首先是神经辐射场(NeRF)的提出,随后是三维高斯泼溅(3DGS)。后者在训练和推理速度以及重建质量方面相较于NeRF具有显著优势。尽管3DGS在密集输入图像上表现良好,但其类似无结构点云的表示方式在面对极具挑战性的稀疏输入图像(例如3张图像)时容易过拟合,从而在新视角下产生形似杂乱针状的表示。为解决这一问题,我们提出正则化优化和基于深度的初始化方法。核心思路是引入一种可在二维图像空间中受控的结构化高斯表示。随后我们对高斯参数(尤其是其位置)施加约束,阻止其在优化过程中独立移动。具体而言,我们通过隐式卷积解码器和全变分损失函数分别引入单视图和多视图约束。在高斯一致性约束基础上,进一步利用基于光流的损失函数限制优化过程。为支持正则化优化,我们提出在各输入视图上利用单目深度估计初始化高斯参数的方法。实验表明,在多种场景下,本方法相较于最先进的基于稀疏视图NeRF的方法取得了显著改进。