Most previous co-salient object detection works mainly focus on extracting co-salient cues via mining the consistency relations across images while ignoring explicit exploration of background regions. In this paper, we propose a Discriminative co-saliency and background Mining Transformer framework (DMT) based on several economical multi-grained correlation modules to explicitly mine both co-saliency and background information and effectively model their discrimination. Specifically, we first propose a region-to-region correlation module for introducing inter-image relations to pixel-wise segmentation features while maintaining computational efficiency. Then, we use two types of pre-defined tokens to mine co-saliency and background information via our proposed contrast-induced pixel-to-token correlation and co-saliency token-to-token correlation modules. We also design a token-guided feature refinement module to enhance the discriminability of the segmentation features under the guidance of the learned tokens. We perform iterative mutual promotion for the segmentation feature extraction and token construction. Experimental results on three benchmark datasets demonstrate the effectiveness of our proposed method. The source code is available at: https://github.com/dragonlee258079/DMT.
翻译:以往大多数共显著目标检测工作主要关注通过挖掘图像间的一致性关系来提取共显著线索,同时忽视了对背景区域的显式探索。本文提出一种基于多个经济型多粒度相关模块的判别性共显著性与背景挖掘Transformer框架(DMT),以显式挖掘共显著性和背景信息,并有效建模其判别性。具体而言,我们首先提出一种区域间相关性模块,在保持计算效率的同时,将图像间关系引入像素级分割特征。随后,通过提出的对比诱导像素-令牌相关性模块和共显著性令牌-令牌相关性模块,利用两种预定义令牌挖掘共显著性与背景信息。我们还设计了一种令牌引导的特征精炼模块,在已学习令牌的指导下增强分割特征的判别性。我们对分割特征提取与令牌构建进行迭代式协同优化。在三个基准数据集上的实验结果表明了所提方法的有效性。源代码已开源:https://github.com/dragonlee258079/DMT。