Weakly supervised semantic segmentation (WSSS) based on image-level labels is challenging since it is hard to obtain complete semantic regions. To address this issue, we propose a self-training method that utilizes fused multi-scale class-aware attention maps. Our observation is that attention maps of different scales contain rich complementary information, especially for large and small objects. Therefore, we collect information from attention maps of different scales and obtain multi-scale attention maps. We then apply denoising and reactivation strategies to enhance the potential regions and reduce noisy areas. Finally, we use the refined attention maps to retrain the network. Experiments showthat our method enables the model to extract rich semantic information from multi-scale images and achieves 72.4% mIou scores on both the PASCAL VOC 2012 validation and test sets. The code is available at https://bupt-ai-cz.github.io/SMAF.
翻译:基于图像级标签的弱监督语义分割(WSSS)极具挑战性,因其难以获取完整的语义区域。针对此问题,本文提出一种利用融合多尺度类别感知注意力图的自训练方法。我们的观察表明,不同尺度的注意力图包含丰富的互补信息,尤其对大物体和小物体而言。因此,我们从不同尺度的注意力图中收集信息,得到多尺度注意力图。随后,采用去噪与重激活策略增强潜在区域并减少噪声区域。最后,利用优化后的注意力图对网络进行重新训练。实验表明,本方法使模型能够从多尺度图像中提取丰富的语义信息,在PASCAL VOC 2012验证集和测试集上均达到72.4%的mIoU分数。代码开源地址为https://bupt-ai-cz.github.io/SMAF。