Despite the impressive performance of Multi-view Stereo (MVS) approaches given plenty of training samples, the performance degradation when generalizing to unseen domains has not been clearly explored yet. In this work, we focus on the domain generalization problem in MVS. To evaluate the generalization results, we build a novel MVS domain generalization benchmark including synthetic and real-world datasets. In contrast to conventional domain generalization benchmarks, we consider a more realistic but challenging scenario, where only one source domain is available for training. The MVS problem can be analogized back to the feature matching task, and maintaining robust feature consistency among views is an important factor for improving generalization performance. To address the domain generalization problem in MVS, we propose a novel MVS framework, namely RobustMVS. A DepthClustering-guided Whitening (DCW) loss is further introduced to preserve the feature consistency among different views, which decorrelates multi-view features from viewpoint-specific style information based on geometric priors from depth maps. The experimental results further show that our method achieves superior performance on the domain generalization benchmark.
翻译:尽管多视图立体(MVS)方法在大量训练样本下取得了令人瞩目的性能,但其在推广到未见领域时的性能退化问题尚未得到充分研究。本文聚焦于MVS中的域泛化问题。为评估泛化效果,我们构建了一个包含合成数据集和真实世界数据集的新型MVS域泛化基准。与传统的域泛化基准不同,我们考虑了一个更实际但更具挑战性的场景:仅有一个源领域可用于训练。MVS问题可类比为特征匹配任务,而保持视图间稳健的特征一致性是提升泛化性能的关键因素。针对MVS的域泛化问题,我们提出了一种新颖的MVS框架——RobustMVS。本文进一步引入了一种深度聚类引导的白化(DCW)损失函数,该函数基于深度图提供的几何先验,通过解耦多视图特征与视角特定风格信息来保持视图间的特征一致性。实验结果表明,我们的方法在域泛化基准上取得了优越性能。