We present a novel approach, termed ADGaussian, for generalizable street scene reconstruction. The proposed method enables high-quality rendering from merely single-view input. Unlike prior Gaussian Splatting methods that primarily focus on geometry refinement, we emphasize the importance of joint optimization of image and depth features for accurate Gaussian prediction. To this end, we first incorporate sparse LiDAR depth as an additional input modality, formulating the Gaussian prediction process as a joint learning framework of visual information and geometric clue. Furthermore, we propose a Multi-modal Feature Matching strategy coupled with a Multi-scale Gaussian Decoding model to enhance the joint refinement of multi-modal features, thereby enabling efficient multi-modal Gaussian learning. Extensive experiments on Waymo and KITTI demonstrate that our ADGaussian achieves state-of-the-art performance and exhibits superior zero-shot generalization capabilities in novel-view shifting.
翻译:我们提出了一种名为ADGaussian的新方法,用于实现泛化性街道场景重建。该方法仅需单视图输入即可实现高质量渲染。与先前主要关注几何优化的高斯溅射方法不同,我们强调图像特征与深度特征联合优化对于精确高斯预测的重要性。为此,我们首先将稀疏LiDAR深度作为额外输入模态,将高斯预测过程构建为视觉信息与几何线索的联合学习框架。此外,我们提出了一种多模态特征匹配策略,并结合多尺度高斯解码模型,以增强多模态特征的联合优化,从而实现高效的多模态高斯学习。在Waymo和KITTI数据集上的大量实验表明,我们的ADGaussian方法取得了最先进的性能,并在新视角迁移中展现出卓越的零样本泛化能力。