We present MVSGaussian, a new generalizable 3D Gaussian representation approach derived from Multi-View Stereo (MVS) that can efficiently reconstruct unseen scenes. Specifically, 1) we leverage MVS to encode geometry-aware Gaussian representations and decode them into Gaussian parameters. 2) To further enhance performance, we propose a hybrid Gaussian rendering that integrates an efficient volume rendering design for novel view synthesis. 3) To support fast fine-tuning for specific scenes, we introduce a multi-view geometric consistent aggregation strategy to effectively aggregate the point clouds generated by the generalizable model, serving as the initialization for per-scene optimization. Compared with previous generalizable NeRF-based methods, which typically require minutes of fine-tuning and seconds of rendering per image, MVSGaussian achieves real-time rendering with better synthesis quality for each scene. Compared with the vanilla 3D-GS, MVSGaussian achieves better view synthesis with less training computational cost. Extensive experiments on DTU, Real Forward-facing, NeRF Synthetic, and Tanks and Temples datasets validate that MVSGaussian attains state-of-the-art performance with convincing generalizability, real-time rendering speed, and fast per-scene optimization.
翻译:我们提出了MVSGaussian,一种源自多视角立体(MVS)的新型可泛化3D高斯表示方法,能够高效重建未见场景。具体而言:1)我们利用MVS编码几何感知的高斯表示,并将其解码为高斯参数。2)为进一步提升性能,我们提出了一种混合高斯渲染方法,它集成了高效的体渲染设计以用于新视角合成。3)为支持针对特定场景的快速微调,我们引入了一种多视角几何一致性聚合策略,以有效聚合由泛化模型生成的点云,作为逐场景优化的初始化。与以往基于可泛化NeRF的方法(通常需要数分钟微调和数秒每图像的渲染时间)相比,MVSGaussian实现了实时渲染,并为每个场景提供了更好的合成质量。与原始3D-GS相比,MVSGaussian以更低的训练计算成本实现了更好的视角合成。在DTU、Real Forward-facing、NeRF Synthetic以及Tanks and Temples数据集上的大量实验验证,MVSGaussian凭借其令人信服的泛化能力、实时渲染速度以及快速的逐场景优化,达到了最先进的性能水平。