This study introduces an efficacious approach, Masked Collaborative Contrast (MCC), to emphasize semantic regions in weakly supervised semantic segmentation. MCC adroitly incorporates concepts from masked image modeling and contrastive learning to devise Transformer blocks that induce keys to contract towards semantically pertinent regions. Unlike prevalent techniques that directly eradicate patch regions in the input image when generating masks, we scrutinize the neighborhood relations of patch tokens by exploring masks considering keys on the affinity matrix. Moreover, we generate positive and negative samples in contrastive learning by utilizing the masked local output and contrasting it with the global output. Elaborate experiments on commonly employed datasets evidences that the proposed MCC mechanism effectively aligns global and local perspectives within the image, attaining impressive performance.
翻译:本研究提出了一种高效方法——掩码协作对比(MCC),旨在增强弱监督语义分割中语义区域的关注度。MCC巧妙融合了掩码图像建模与对比学习的思想,设计出Transformer模块,促使键(keys)向语义相关区域收缩。与当前主流的直接在输入图像中剔除补丁区域以生成掩码的技术不同,我们通过探索基于亲和矩阵的掩码,细致分析补丁令牌的邻域关系。此外,我们利用掩码局部输出并与全局输出进行对比,在对比学习中生成正负样本。在常用数据集上的详尽实验表明,所提出的MCC机制有效对齐了图像内的全局与局部视角,取得了显著性能。