Feed-forward 3D Gaussian Splatting (3DGS) removes the need for time-consuming per-scene optimization required by traditional 3DGS. However, existing feed-forward approaches struggle with real-world photo collections that include diverse lighting conditions and transient objects. In this paper, we present Wild3R, a feed-forward approach for unconstrained sparse photo collections. The main bottleneck is the lack of training data that provides multiple viewpoints, a variety of illuminations, and transient variations necessary for learning robust scene representations. To address this, we introduce the WildCity dataset, which comprises 200 scenes, 170 lighting conditions, and transient objects, resulting in 337,500 images in total. By leveraging the dataset, our model learns appearance consistency across viewpoints conditioned on reference views, while removing transient content. Extensive experiments demonstrate that our method outperforms existing feed-forward approaches and achieves results competitive with prior per-scene optimization-based methods.
翻译:前馈式三维高斯泼溅消除了传统三维高斯泼溅需要针对每场景耗时优化的需求。然而,现有前馈方法难以处理包含多样化光照条件和瞬态物体的真实照片集。本文提出Wild3R——一种适用于无约束稀疏照片集的前馈方法。该研究的主要瓶颈在于缺乏能够同时提供多视角、多种光照变化及瞬态变化以学习鲁棒场景表征的训练数据。为此我们构建了WildCity数据集,包含200个场景、170种光照条件和瞬态物体,共计337,500张图像。通过利用该数据集,我们的模型能够基于参考视图学习跨视角的外观一致性,同时去除瞬态内容。大量实验表明,本方法优于现有前馈方法,并取得与先前基于场景优化的方法相当的结果。