We propose MVSplat, an efficient feed-forward 3D Gaussian Splatting model learned from sparse multi-view images. To accurately localize the Gaussian centers, we propose to build a cost volume representation via plane sweeping in the 3D space, where the cross-view feature similarities stored in the cost volume can provide valuable geometry cues to the estimation of depth. We learn the Gaussian primitives' opacities, covariances, and spherical harmonics coefficients jointly with the Gaussian centers while only relying on photometric supervision. We demonstrate the importance of the cost volume representation in learning feed-forward Gaussian Splatting models via extensive experimental evaluations. On the large-scale RealEstate10K and ACID benchmarks, our model achieves state-of-the-art performance with the fastest feed-forward inference speed (22 fps). Compared to the latest state-of-the-art method pixelSplat, our model uses $10\times $ fewer parameters and infers more than $2\times$ faster while providing higher appearance and geometry quality as well as better cross-dataset generalization.
翻译:我们提出MVSplat,一种从稀疏多视角图像中学习的高效前馈式3D高斯泼溅模型。为了精确定位高斯中心,我们提出在三维空间中通过平面扫描构建代价体表示,其中存储于代价体中的跨视角特征相似度可为深度估计提供有价值的几何线索。我们联合学习高斯图元的透明度、协方差和球谐系数与高斯中心,仅依赖光度监督。通过大量实验评估,我们证明了代价体表示在学习前馈式高斯泼溅模型中的重要性。在大型RealEstate10K和ACID基准上,我们的模型以最快的前馈推理速度(22 fps)实现了最先进的性能。与最新最先进方法pixelSplat相比,本模型参数减少10倍以上且推理速度提升2倍以上,同时提供更高的外观与几何质量以及更好的跨数据集泛化能力。