The proposed method in this paper proposes an end-to-end unsupervised semantic segmentation architecture DMSA based on four loss functions. The framework uses Atrous Spatial Pyramid Pooling (ASPP) module to enhance feature extraction. At the same time, a dynamic dilation strategy is designed to better capture multi-scale context information. Secondly, a Pixel-Adaptive Refinement (PAR) module is introduced, which can adaptively refine the initial pseudo labels after feature fusion to obtain high quality pseudo labels. Experiments show that the proposed DSMA framework is superior to the existing methods on the saliency dataset. On the COCO 80 dataset, the MIoU is improved by 2.0, and the accuracy is improved by 5.39. On the Pascal VOC 2012 Augmented dataset, the MIoU is improved by 4.9, and the accuracy is improved by 3.4. In addition, the convergence speed of the model is also greatly improved after the introduction of the PAR module.
翻译:本文提出了一种基于四个损失函数的端到端无监督语义分割架构DMSA。该框架采用空洞空间金字塔池化(ASPP)模块增强特征提取能力,同时设计了动态扩张策略以更好地捕获多尺度上下文信息。其次,引入像素自适应精化(PAR)模块,该模块能够在特征融合后自适应地优化初始伪标签,从而获得高质量伪标签。实验表明,所提出的DMSA框架在显著性数据集上优于现有方法。在COCO 80数据集上,MIoU提升2.0,准确率提升5.39;在Pascal VOC 2012增强数据集上,MIoU提升4.9,准确率提升3.4。此外,引入PAR模块后,模型的收敛速度也得到显著提升。