This study introduces an efficacious approach, Masked Collaborative Contrast (MCC), to emphasize semantic regions in weakly supervised semantic segmentation. MCC adroitly incorporates concepts from masked image modeling and contrastive learning to devise Transformer blocks that induce keys to contract towards semantically pertinent regions. Unlike prevalent techniques that directly eradicate patch regions in the input image when generating masks, we scrutinize the neighborhood relations of patch tokens by exploring masks considering keys on the affinity matrix. Moreover, we generate positive and negative samples in contrastive learning by utilizing the masked local output and contrasting it with the global output. Elaborate experiments on commonly employed datasets evidences that the proposed MCC mechanism effectively aligns global and local perspectives within the image, attaining impressive performance. The source code is available at \url{https://github.com/fwu11/MCC}.
翻译:本研究提出一种高效方法——掩码协同对比(MCC),用于在弱监督语义分割中强化语义区域。MCC巧妙地融合了掩码图像建模与对比学习的思想,设计出能够促使键向语义相关区域收缩的Transformer模块。与现有主流技术直接在输入图像中删除补丁区域以生成掩码不同,我们通过探索基于亲和矩阵上键的掩码,细致分析补丁标记的邻域关系。此外,我们利用掩码局部输出并将其与全局输出进行对比,从而生成对比学习中的正负样本。在常用数据集上的详尽实验表明,所提出的MCC机制能够有效对齐图像内的全局与局部视角,取得令人瞩目的性能。源代码发布于 \url{https://github.com/fwu11/MCC}。