Co-Salient Object Detection with Semantic-Level Consensus Extraction and Dispersion

Given a group of images, co-salient object detection (CoSOD) aims to highlight the common salient object in each image. There are two factors closely related to the success of this task, namely consensus extraction, and the dispersion of consensus to each image. Most previous works represent the group consensus using local features, while we instead utilize a hierarchical Transformer module for extracting semantic-level consensus. Therefore, it can obtain a more comprehensive representation of the common object category, and exclude interference from other objects that share local similarities with the target object. In addition, we propose a Transformer-based dispersion module that takes into account the variation of the co-salient object in different scenes. It distributes the consensus to the image feature maps in an image-specific way while making full use of interactions within the group. These two modules are integrated with a ViT encoder and an FPN-like decoder to form an end-to-end trainable network, without additional branch and auxiliary loss. The proposed method is evaluated on three commonly used CoSOD datasets and achieves state-of-the-art performance.

翻译：给定一组图像，共显著性目标检测旨在突出每幅图像中的共同显著目标。该任务的成功与两个因素密切相关，即共识提取与共识向每幅图像的分散。以往的大多数工作使用局部特征来表示群体共识，而我们则利用分层Transformer模块提取语义级共识。因此，它能获得对共同目标类别更全面的表示，并排除与目标对象存在局部相似性的其他物体的干扰。此外，我们提出了一种基于Transformer的分散模块，该模块考虑了共显著目标在不同场景中的变化。它以图像特定的方式将共识分配到图像特征图，同时充分利用群体内的交互。这两个模块与ViT编码器和类FPN解码器集成，构成一个端到端可训练的网络，无需额外的分支和辅助损失。所提方法在三个常用的共显著性目标检测数据集上进行了评估，并取得了最先进的性能。

相关内容

GROUP

关注 1

Group一直是研究计算机支持的合作工作、人机交互、计算机支持的协作学习和社会技术研究的主要场所。该会议将社会科学、计算机科学、工程、设计、价值观以及其他与小组工作相关的多个不同主题的工作结合起来，并进行了广泛的概念化。官网链接：https://group.acm.org/conferences/group20/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日