Seed area generation is usually the starting point of weakly supervised semantic segmentation (WSSS). Computing the Class Activation Map (CAM) from a multi-label classification network is the de facto paradigm for seed area generation, but CAMs generated from Convolutional Neural Networks (CNNs) and Transformers are prone to be under- and over-activated, respectively, which makes the strategies to refine CAMs for CNNs usually inappropriate for Transformers, and vice versa. In this paper, we propose a Unified optimization paradigm for Seed Area GEneration (USAGE) for both types of networks, in which the objective function to be optimized consists of two terms: One is a generation loss, which controls the shape of seed areas by a temperature parameter following a deterministic principle for different types of networks; The other is a regularization loss, which ensures the consistency between the seed areas that are generated by self-adaptive network adjustment from different views, to overturn false activation in seed areas. Experimental results show that USAGE consistently improves seed area generation for both CNNs and Transformers by large margins, e.g., outperforming state-of-the-art methods by a mIoU of 4.1% on PASCAL VOC. Moreover, based on the USAGE-generated seed areas on Transformers, we achieve state-of-the-art WSSS results on both PASCAL VOC and MS COCO.
翻译:种子区域生成通常是弱监督语义分割(WSSS)的起点。从多标签分类网络计算类激活图(CAM)是种子区域生成的事实标准范式,但基于卷积神经网络(CNN)和Transformer生成的CAM分别容易欠激活和过激活,这使得针对CNN的CAM优化策略通常不适用于Transformer,反之亦然。本文提出一种面向两类网络的统一种子区域生成优化范式(USAGE),其优化目标函数包含两项:一是生成损失,通过温度参数遵循不同类型网络的确定性原则控制种子区域形状;二是正则化损失,确保通过自适应网络调整从不同视角生成的种子区域之间的一致性,以纠正种子区域中的错误激活。实验结果表明,USAGE大幅提升了CNN和Transformer的种子区域生成效果,例如在PASCAL VOC上以4.1%的mIoU超越现有最优方法。此外,基于USAGE在Transformer上生成的种子区域,我们在PASCAL VOC和MS COCO上均取得了最优的WSSS结果。