This paper addresses the task of semantic segmentation in computer vision, aiming to achieve precise pixel-wise classification. We investigate the joint training of models for semantic edge detection and semantic segmentation, which has shown promise. However, implicit cross-task consistency learning in multi-task networks is limited. To address this, we propose a novel "decoupled cross-task consistency loss" that explicitly enhances cross-task consistency. Our semantic segmentation network, TriangleNet, achieves a substantial 2.88\% improvement over the Baseline in mean Intersection over Union (mIoU) on the Cityscapes test set. Notably, TriangleNet operates at 77.4\% mIoU/46.2 FPS on Cityscapes, showcasing real-time inference capabilities at full resolution. With multi-scale inference, performance is further enhanced to 77.8\%. Furthermore, TriangleNet consistently outperforms the Baseline on the FloodNet dataset, demonstrating its robust generalization capabilities. The proposed method underscores the significance of multi-task learning and explicit cross-task consistency enhancement for advancing semantic segmentation and highlights the potential of multitasking in real-time semantic segmentation.
翻译:本文研究计算机视觉中的语义分割任务,旨在实现精确的逐像素分类。我们探讨了语义边缘检测与语义分割模型的联合训练方法,该方法已展现出潜力。然而,多任务网络中隐式的跨任务一致性学习存在局限性。为解决此问题,我们提出一种新颖的“解耦跨任务一致性损失”,以显式增强跨任务一致性。我们的语义分割网络TriangleNet在Cityscapes测试集上的平均交并比(mIoU)相较于基线方法提升了2.88%。尤为重要的是,TriangleNet在Cityscapes数据集上以全分辨率实现了77.4% mIoU/46.2 FPS的实时推理性能。采用多尺度推理后,性能进一步提升至77.8%。此外,TriangleNet在FloodNet数据集上始终优于基线方法,展示了其强大的泛化能力。所提方法强调了多任务学习与显式跨任务一致性增强对推进语义分割的重要性,并揭示了多任务学习在实时语义分割中的应用潜力。