SafaRi:Adaptive Sequence Transformer for Weakly Supervised Referring Expression Segmentation

Referring Expression Segmentation (RES) aims to provide a segmentation mask of the target object in an image referred to by the text (i.e., referring expression). Existing methods require large-scale mask annotations. Moreover, such approaches do not generalize well to unseen/zero-shot scenarios. To address the aforementioned issues, we propose a weakly-supervised bootstrapping architecture for RES with several new algorithmic innovations. To the best of our knowledge, ours is the first approach that considers only a fraction of both mask and box annotations (shown in Figure 1 and Table 1) for training. To enable principled training of models in such low-annotation settings, improve image-text region-level alignment, and further enhance spatial localization of the target object in the image, we propose Cross-modal Fusion with Attention Consistency module. For automatic pseudo-labeling of unlabeled samples, we introduce a novel Mask Validity Filtering routine based on a spatially aware zero-shot proposal scoring approach. Extensive experiments show that with just 30% annotations, our model SafaRi achieves 59.31 and 48.26 mIoUs as compared to 58.93 and 48.19 mIoUs obtained by the fully-supervised SOTA method SeqTR respectively on RefCOCO+@testA and RefCOCO+testB datasets. SafaRi also outperforms SeqTR by 11.7% (on RefCOCO+testA) and 19.6% (on RefCOCO+testB) in a fully-supervised setting and demonstrates strong generalization capabilities in unseen/zero-shot tasks.

翻译：指称表达式分割（RES）旨在根据文本（即指称表达式）为图像中的目标对象提供分割掩码。现有方法需要大规模掩码标注，且此类方法在未见/零样本场景中泛化能力有限。针对上述问题，我们提出一种弱监督自举架构用于RES，并引入多项新算法创新。据我们所知，本研究首次提出仅使用部分掩码与边界框标注（如图1与表1所示）进行训练的方法。为在此低标注条件下实现模型的理论训练、提升图文区域级对齐能力并进一步增强图像中目标对象的空间定位精度，我们提出基于注意力一致性的跨模态融合模块。针对未标注样本的自动伪标签生成，我们引入一种基于空间感知零样本建议评分的新型掩码有效性过滤机制。大量实验表明：在仅使用30%标注数据时，我们的模型SafaRi在RefCOCO+@testA和RefCOCO+testB数据集上分别取得59.31和48.26的mIoU，优于全监督SOTA方法SeqTR的58.93和48.19 mIoU。在完全监督设定下，SafaRi在RefCOCO+testA和RefCOCO+testB上分别以11.7%和19.6%的优势超越SeqTR，并在未见/零样本任务中展现出强大的泛化能力。

相关内容

Safari

关注 0

Safari 是苹果公司所开发的网页浏览器，并自带于 Mac OS X。Safari 在 2003 年 1 月 7 日首度发布测试版，并成为 Mac OS X v10.3 与之后的默认浏览器，也是iOS的指定浏览器。Windows 版本的首个测试版在 2007 年 6 月 11 日推出，支持 Windows XP，Windows Vista 和 Windows 7，在 2008 年 3 月 18 日推出正式版。2012 年 7 月 27 日 Apple 已经停止开发 Windows 版的 Safari 浏览器。 Source: 维基百科，自由的百科全书 | Safari

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日