With advancements in domain generalized stereo matching networks, models pre-trained on synthetic data demonstrate strong robustness to unseen domains. However, few studies have investigated the robustness after fine-tuning them in real-world scenarios, during which the domain generalization ability can be seriously degraded. In this paper, we explore fine-tuning stereo matching networks without compromising their robustness to unseen domains. Our motivation stems from comparing Ground Truth (GT) versus Pseudo Label (PL) for fine-tuning: GT degrades, but PL preserves the domain generalization ability. Empirically, we find the difference between GT and PL implies valuable information that can regularize networks during fine-tuning. We also propose a framework to utilize this difference for fine-tuning, consisting of a frozen Teacher, an exponential moving average (EMA) Teacher, and a Student network. The core idea is to utilize the EMA Teacher to measure what the Student has learned and dynamically improve GT and PL for fine-tuning. We integrate our framework with state-of-the-art networks and evaluate its effectiveness on several real-world datasets. Extensive experiments show that our method effectively preserves the domain generalization ability during fine-tuning.
翻译:随着域广义立体匹配网络的发展,基于合成数据预训练的模型展现出对未知域的强鲁棒性。然而,在真实场景微调后其域泛化能力可能严重退化,但鲜有研究对此进行深入探讨。本文致力于在保持对未知域鲁棒性的前提下,探索立体匹配网络的微调方法。我们的动机源于对比真实标签与伪标签对微调效果的差异:真实标签会削弱域泛化能力,而伪标签则能保持该能力。实验表明,真实标签与伪标签的差异蕴含有效信息,可在微调过程中对网络进行正则化约束。我们进一步提出利用该差异进行微调的框架,该框架包含冻结教师网络、指数移动平均教师网络和学生网络三个组成部分。核心思想是通过指数移动平均教师网络量化学生网络的学习进度,并动态优化真实标签与伪标签以提升微调效果。我们将该框架集成到多个先进网络中,并在多个真实数据集上验证其有效性。大量实验证明,本方法在微调过程中有效保持了网络的域泛化能力。