Photometric loss and pseudo-label-based self-training are two widely used methods for training stereo networks on unlabeled data. However, they both struggle to provide accurate supervision in occluded regions. The former lacks valid correspondences, while the latter's pseudo labels are often unreliable. To overcome these limitations, we present S$^3$, a simple yet effective framework based on multi-baseline geometry consistency. Unlike conventional self-training where teacher and student share identical stereo pairs, S$^3$ assigns them different target images, introducing natural visibility asymmetry. Regions occluded in the student's view often remain visible and matchable to the teacher, enabling reliable pseudo labels even in regions where photometric supervision fails. The teacher's disparities are rescaled to align with the student's baseline and used to guide student learning. An occlusion-aware weighting strategy is further proposed to mitigate unreliable supervision in teacher-occluded regions and to encourage the student to learn robust occlusion completion. To support training, we construct MBS20K, a multi-baseline stereo dataset synthesized using the CARLA simulator. Extensive experiments demonstrate that S$^3$ provides effective supervision in both occluded and non-occluded regions, achieves strong generalization performance, and surpasses previous state-of-the-art methods on the KITTI 2015 and 2012 benchmarks.
翻译:光度损失与基于伪标签的自训练是两种广泛用于在无标签数据上训练立体网络的方法。然而,两者都难以在遮挡区域提供准确的监督:前者缺乏有效的对应关系,而后者的伪标签通常不可靠。为克服这些局限,我们提出了S$^3$,一个基于多基线几何一致性的简洁而有效的框架。与传统自训练中师生共享相同立体图像对不同,S$^3$为它们分配不同的目标图像,从而引入天然的可见性不对称性。在学生视角中被遮挡的区域,在教师视角中往往仍然可见且可匹配,这使得即使在光度监督失效的区域也能产生可靠的伪标签。教师的视差图会经过重新缩放以对齐学生的基线,并用于指导学生网络的学习。我们进一步提出一种遮挡感知的加权策略,以减轻教师遮挡区域中不可靠的监督,并鼓励学生学习鲁棒的遮挡补全。为支持训练,我们利用CARLA模拟器构建了多基线立体数据集MBS20K。大量实验表明,S$^3$在遮挡与非遮挡区域均能提供有效的监督,取得了强大的泛化性能,并在KITTI 2015和2012基准测试中超越了以往的最先进方法。