Multi-Granularity Denoising and Bidirectional Alignment for Weakly Supervised Semantic Segmentation

Weakly supervised semantic segmentation (WSSS) models relying on class activation maps (CAMs) have achieved desirable performance comparing to the non-CAMs-based counterparts. However, to guarantee WSSS task feasible, we need to generate pseudo labels by expanding the seeds from CAMs which is complex and time-consuming, thus hindering the design of efficient end-to-end (single-stage) WSSS approaches. To tackle the above dilemma, we resort to the off-the-shelf and readily accessible saliency maps for directly obtaining pseudo labels given the image-level class labels. Nevertheless, the salient regions may contain noisy labels and cannot seamlessly fit the target objects, and saliency maps can only be approximated as pseudo labels for simple images containing single-class objects. As such, the achieved segmentation model with these simple images cannot generalize well to the complex images containing multi-class objects. To this end, we propose an end-to-end multi-granularity denoising and bidirectional alignment (MDBA) model, to alleviate the noisy label and multi-class generalization issues. Specifically, we propose the online noise filtering and progressive noise detection modules to tackle image-level and pixel-level noise, respectively. Moreover, a bidirectional alignment mechanism is proposed to reduce the data distribution gap at both input and output space with simple-to-complex image synthesis and complex-to-simple adversarial learning. MDBA can reach the mIoU of 69.5\% and 70.2\% on validation and test sets for the PASCAL VOC 2012 dataset. The source codes and models have been made available at \url{https://github.com/NUST-Machine-Intelligence-Laboratory/MDBA}.

翻译：基于类激活图（CAMs）的弱监督语义分割（WSSS）模型相比非CAMs方法取得了令人满意的性能。然而，为了确保WSSS任务的可行性，我们需要通过扩展CAMs生成的种子来产生伪标签，这一过程复杂且耗时，从而阻碍了高效端到端（单阶段）WSSS方法的设计。为解决上述困境，我们利用现成且易于获取的显著性图，在给定图像级类别标签的情况下直接获得伪标签。然而，显著性区域可能包含噪声标签，无法无缝贴合目标物体，且显著性图只能近似作为包含单类物体的简单图像的伪标签。因此，在这些简单图像上训练的分割模型难以泛化到包含多类物体的复杂图像。为此，我们提出一种端到端的多粒度去噪与双向对齐（MDBA）模型，以缓解噪声标签和多类泛化问题。具体而言，我们分别设计了在线噪声过滤和渐进式噪声检测模块来处理图像级噪声和像素级噪声。此外，提出了一种双向对齐机制，通过简单到复杂的图像合成和复杂到简单的对抗学习，在输入和输出空间同时减小数据分布差距。MDBA在PASCAL VOC 2012数据集的验证集和测试集上分别达到了69.5%和70.2%的mIoU。源代码和模型已发布于\url{https://github.com/NUST-Machine-Intelligence-Laboratory/MDBA}。