Generalizable 3D Gaussian Splatting aims to directly predict Gaussian parameters using a feed-forward network for scene reconstruction. Among these parameters, Gaussian means are particularly difficult to predict, so depth is usually estimated first and then unprojected to obtain the Gaussian sphere centers. Existing methods typically rely solely on a single warp to estimate depth probability, which hinders their ability to fully leverage cross-view geometric cues, resulting in unstable and coarse depth maps. To address this limitation, we propose IDESplat, which iteratively applies warp operations to boost depth probability estimation for accurate Gaussian mean prediction. First, to eliminate the inherent instability of a single warp, we introduce a Depth Probability Boosting Unit (DPBU) that integrates epipolar attention maps produced by cascading warp operations in a multiplicative manner. Next, we construct an iterative depth estimation process by stacking multiple DPBUs, progressively identifying potential depth candidates with high likelihood. As IDESplat iteratively boosts depth probability estimates and updates the depth candidates, the depth map is gradually refined, resulting in accurate Gaussian means. We conduct experiments on RealEstate10K, ACID, and DL3DV. IDESplat achieves outstanding reconstruction quality and state-of-the-art performance with real-time efficiency. On RE10K, it outperforms DepthSplat by 0.33 dB in PSNR, using only 10.7% of the parameters and 70% of the memory. Additionally, our IDESplat improves PSNR by 2.95 dB over DepthSplat on the DTU dataset in cross-dataset experiments, demonstrating its strong generalization ability.
翻译:泛化性3D高斯泼溅旨在通过前馈网络直接预测高斯参数以实现场景重建。在这些参数中,高斯均值尤其难以预测,因此通常先估计深度,再通过反投影得到高斯球体中心。现有方法通常仅依赖单次扭曲操作来估计深度概率,这限制了其充分利用跨视图几何线索的能力,导致生成不稳定且粗糙的深度图。为克服这一局限,我们提出IDESplat,通过迭代应用扭曲操作来增强深度概率估计,从而实现准确的高斯均值预测。首先,为消除单次扭曲固有的不稳定性,我们引入深度概率增强单元,该单元以乘法方式集成由级联扭曲操作产生的极线注意力图。接着,通过堆叠多个DPBU构建迭代深度估计过程,逐步识别高可能性的潜在深度候选值。随着IDESplat迭代增强深度概率估计并更新深度候选值,深度图逐渐精细化,最终得到精确的高斯均值。我们在RealEstate10K、ACID和DL3DV数据集上进行了实验。IDESplat在实现实时效率的同时,取得了卓越的重建质量与最先进的性能。在RE10K数据集上,其PSNR指标超过DepthSplat 0.33 dB,而参数量仅为其10.7%,内存消耗为70%。此外,在跨数据集实验中,我们的IDESplat在DTU数据集上将PSNR较DepthSplat提升了2.95 dB,展现出强大的泛化能力。