基于大型基础模型知识蒸馏的显著目标检测增强 (Boosting Salient Object Detection with Knowledge Distillated from Large Foundation Models)

Salient Object Detection (SOD) aims to identify and segment prominent regions within a scene. Traditional models rely on manually annotated pseudo labels with precise pixel-level accuracy, which is time-consuming. We developed a low-cost, high-precision annotation method by leveraging large foundation models to address the challenges. Specifically, we use a weakly supervised approach to guide large models in generating pseudo-labels through textual prompts. Since large models do not effectively focus on the salient regions of images, we manually annotate a subset of text to fine-tune the model. Based on this approach, which enables precise and rapid generation of pseudo-labels, we introduce a new dataset, BDS-TR. Compared to the previous DUTS-TR dataset, BDS-TR is more prominent in scale and encompasses a wider variety of categories and scenes. This expansion will enhance our model's applicability across a broader range of scenarios and provide a more comprehensive foundational dataset for future SOD research. Additionally, we present an edge decoder based on dynamic upsampling, which focuses on object edges while gradually recovering image feature resolution. Comprehensive experiments on five benchmark datasets demonstrate that our method significantly outperforms state-of-the-art approaches and also surpasses several existing fully-supervised SOD methods. The code and results will be made available.

翻译：显著目标检测（SOD）旨在识别并分割场景中的突出区域。传统模型依赖于人工标注的、具有精确像素级精度的伪标签，这一过程耗时费力。为解决这一挑战，我们开发了一种利用大型基础模型实现低成本、高精度标注的方法。具体而言，我们采用弱监督方法，通过文本提示引导大型模型生成伪标签。由于大型模型无法有效聚焦于图像的显著区域，我们手动标注了部分文本数据以微调模型。基于这种能够快速精确生成伪标签的方法，我们提出了新的数据集BDS-TR。与之前的DUTS-TR数据集相比，BDS-TR在规模上更为突出，并涵盖了更丰富的类别与场景。这一扩展将增强模型在更广泛场景下的适用性，并为未来SOD研究提供更全面的基础数据集。此外，我们提出了一种基于动态上采样的边缘解码器，该解码器在逐步恢复图像特征分辨率的同时聚焦于目标边缘。在五个基准数据集上的综合实验表明，我们的方法显著优于当前最先进的技术，并且超越了多种现有的全监督SOD方法。代码与实验结果将予以公开。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日