Fully supervised salient object detection (SOD) methods have made considerable progress in performance, yet these models rely heavily on expensive pixel-wise labels. Recently, to achieve a trade-off between labeling burden and performance, scribble-based SOD methods have attracted increasing attention. Previous models directly implement the SOD task only based on small-scale SOD training data. Due to the limited information provided by the weakly scribble tags and such small-scale training data, it is extremely difficult for them to understand the image and further achieve a superior SOD task. In this paper, we propose a simple yet effective framework guided by general visual representations that simulate the general cognition of humans for scribble-based SOD. It consists of a task-related encoder, a general visual module, and an information integration module to combine efficiently the general visual representations learned from large-scale unlabeled datasets with task-related features to perform the SOD task based on understanding the contextual connections of images. Meanwhile, we propose a novel global semantic affinity loss to guide the model to perceive the global structure of the salient objects. Experimental results on five public benchmark datasets demonstrate that our method that only utilizes scribble annotations without introducing any extra label outperforms the state-of-the-art weakly supervised SOD methods and is comparable or even superior to the state-of-the-art fully supervised models.
翻译:全监督显著性目标检测方法在性能上取得了显著进展,但这些模型严重依赖昂贵的像素级标注。近年来,为实现标注负担与性能之间的权衡,基于涂鸦的显著性目标检测方法日益受到关注。现有模型仅依赖小规模显著性检测训练数据直接执行任务,由于弱监督涂鸦标签提供的信息有限且训练数据规模较小,它们极难理解图像内容,更难以完成高质量的显著性检测任务。本文提出一种简单而有效的框架,通过模拟人类通用认知的通用视觉表征来指导基于涂鸦的显著性检测。该框架由任务相关编码器、通用视觉模块和信息集成模块构成,旨在将大规模无标注数据中习得的通用视觉表征与任务相关特征高效结合,基于图像上下文关联理解执行显著性检测任务。同时,我们提出一种新颖的全局语义亲和损失函数,引导模型感知显著性目标的全局结构。在五个公开基准数据集上的实验结果表明,本方法仅利用涂鸦标注且无需引入额外标签,其性能不仅优于现有最先进的弱监督显著性检测方法,更可与甚至超越最先进的全监督模型。