Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised Semantic Segmentation

Most existing weakly supervised semantic segmentation (WSSS) methods rely on Class Activation Mapping (CAM) to extract coarse class-specific localization maps using image-level labels. Prior works have commonly used an off-line heuristic thresholding process that combines the CAM maps with off-the-shelf saliency maps produced by a general pre-trained saliency model to produce more accurate pseudo-segmentation labels. We propose AuxSegNet+, a weakly supervised auxiliary learning framework to explore the rich information from these saliency maps and the significant inter-task correlation between saliency detection and semantic segmentation. In the proposed AuxSegNet+, saliency detection and multi-label image classification are used as auxiliary tasks to improve the primary task of semantic segmentation with only image-level ground-truth labels. We also propose a cross-task affinity learning mechanism to learn pixel-level affinities from the saliency and segmentation feature maps. In particular, we propose a cross-task dual-affinity learning module to learn both pairwise and unary affinities, which are used to enhance the task-specific features and predictions by aggregating both query-dependent and query-independent global context for both saliency detection and semantic segmentation. The learned cross-task pairwise affinity can also be used to refine and propagate CAM maps to provide better pseudo labels for both tasks. Iterative improvement of segmentation performance is enabled by cross-task affinity learning and pseudo-label updating. Extensive experiments demonstrate the effectiveness of the proposed approach with new state-of-the-art WSSS results on the challenging PASCAL VOC and MS COCO benchmarks.

翻译：现有的弱监督语义分割（WSSS）方法大多依赖类激活映射（CAM），利用图像级标签提取粗粒度的类别级定位图。先前工作通常采用离线启发式阈值处理流程，将CAM图与通用预训练显著性模型生成的现成显著性图相结合，以生成更准确的伪分割标签。我们提出AuxSegNet+，一种弱监督辅助学习框架，旨在探索这些显著性图中的丰富信息以及显著性检测与语义分割之间的显著跨任务关联。在提出的AuxSegNet+中，显著性检测和多标签图像分类作为辅助任务，仅使用图像级真实标签来提升主要任务——语义分割的性能。我们还提出了一种跨任务亲和度学习机制，从显著性特征图和分割特征图中学习像素级亲和度。具体地，我们设计了一个跨任务双亲和度学习模块，用于同时学习成对亲和度和一元亲和度，通过聚合查询依赖和查询无关的全局上下文来增强任务特定特征和预测，从而同时提升显著性检测和语义分割的性能。学习到的跨任务成对亲和度还可用于优化和传播CAM图，为两个任务提供更优的伪标签。通过跨任务亲和度学习与伪标签更新，实现了分割性能的迭代提升。大量实验表明，所提方法在具有挑战性的PASCAL VOC和MS COCO基准测试上取得了新的WSSS最优结果，验证了其有效性。