Generalizable 3D Gaussian Splatting reconstruction showcases advanced Image-to-3D content creation but requires substantial computational resources and large datasets, posing challenges to training models from scratch. Current methods usually entangle the prediction of 3D Gaussian geometry and appearance, which rely heavily on data-driven priors and result in slow regression speeds. To address this, we propose \method, a disentangled framework for efficient 3D Gaussian prediction. Our method extracts features from local image pairs using a stereo vision backbone and fuses them via global attention blocks. Dedicated point and Gaussian prediction heads generate multi-view point-maps for geometry and Gaussian features for appearance, combined as GS-maps to represent the 3DGS object. A refinement network enhances these GS-maps for high-quality reconstruction. Unlike existing methods that depend on camera parameters, our approach achieves pose-free 3D reconstruction, improving robustness and practicality. By reducing resource demands while maintaining high-quality outputs, \method provides an efficient, scalable solution for real-world 3D content generation.
翻译:可泛化的三维高斯溅射重建技术展现了先进的图像到三维内容生成能力,但其需要大量计算资源和庞大数据集,给从头训练模型带来挑战。现有方法通常将三维高斯的几何与外观预测相耦合,严重依赖数据驱动的先验知识,导致回归速度缓慢。为此,我们提出\method,一种用于高效三维高斯预测的解耦框架。该方法通过立体视觉主干网络从局部图像对中提取特征,并借助全局注意力模块进行特征融合。专用的点云预测头与高斯预测头分别生成用于几何建模的多视角点云图以及用于外观建模的高斯特征,二者结合形成表征三维高斯溅射对象的GS图。精修网络进一步优化这些GS图以实现高质量重建。与现有依赖相机参数的方法不同,我们的方法实现了无姿态约束的三维重建,提升了鲁棒性与实用性。通过降低资源需求同时保持高质量输出,\method为现实世界三维内容生成提供了高效、可扩展的解决方案。