Stereo matching and semantic segmentation are significant tasks in binocular satellite 3D reconstruction. However, previous studies primarily view these as independent parallel tasks, lacking an integrated multitask learning framework. This work introduces a solution, the Single-branch Semantic Stereo Network (S3Net), which innovatively combines semantic segmentation and stereo matching using Self-Fuse and Mutual-Fuse modules. Unlike preceding methods that utilize semantic or disparity information independently, our method dentifies and leverages the intrinsic link between these two tasks, leading to a more accurate understanding of semantic information and disparity estimation. Comparative testing on the US3D dataset proves the effectiveness of our S3Net. Our model improves the mIoU in semantic segmentation from 61.38 to 67.39, and reduces the D1-Error and average endpoint error (EPE) in disparity estimation from 10.051 to 9.579 and 1.439 to 1.403 respectively, surpassing existing competitive methods. Our codes are available at:https://github.com/CVEO/S3Net.
翻译:立体匹配与语义分割是双目卫星三维重建中的重要任务。然而,先前研究主要将其视为独立的并行任务,缺乏统一的多任务学习框架。本文提出一种解决方案——单分支语义立体网络(S3Net),该网络通过自融合与互融合模块创新性地将语义分割与立体匹配相结合。与先前独立利用语义或视差信息的方法不同,我们的方法识别并利用这两个任务之间的内在联系,从而实现对语义信息与视差估计的更精确理解。在US3D数据集上的对比实验证明了S3Net的有效性。我们的模型将语义分割的mIoU从61.38提升至67.39,并将视差估计的D1误差与平均端点误差(EPE)分别从10.051降至9.579、从1.439降至1.403,超越了现有竞争方法。代码已开源:https://github.com/CVEO/S3Net。