Salient Object Detection (SOD) aims to identify and segment prominent regions within a scene. Traditional models rely on manually annotated pseudo labels with precise pixel-level accuracy, which is time-consuming. We developed a low-cost, high-precision annotation method by leveraging large foundation models to address the challenges. Specifically, we use a weakly supervised approach to guide large models in generating pseudo-labels through textual prompts. Since large models do not effectively focus on the salient regions of images, we manually annotate a subset of text to fine-tune the model. Based on this approach, which enables precise and rapid generation of pseudo-labels, we introduce a new dataset, BDS-TR. Compared to the previous DUTS-TR dataset, BDS-TR is more prominent in scale and encompasses a wider variety of categories and scenes. This expansion will enhance our model's applicability across a broader range of scenarios and provide a more comprehensive foundational dataset for future SOD research. Additionally, we present an edge decoder based on dynamic upsampling, which focuses on object edges while gradually recovering image feature resolution. Comprehensive experiments on five benchmark datasets demonstrate that our method significantly outperforms state-of-the-art approaches and also surpasses several existing fully-supervised SOD methods. The code and results will be made available.
翻译:显著目标检测(SOD)旨在识别并分割场景中的突出区域。传统模型依赖于人工标注的、具有精确像素级精度的伪标签,这一过程耗时费力。为解决这一挑战,我们开发了一种利用大型基础模型实现低成本、高精度标注的方法。具体而言,我们采用弱监督方法,通过文本提示引导大型模型生成伪标签。由于大型模型无法有效聚焦于图像的显著区域,我们手动标注了部分文本数据以微调模型。基于这种能够快速精确生成伪标签的方法,我们提出了新的数据集BDS-TR。与之前的DUTS-TR数据集相比,BDS-TR在规模上更为突出,并涵盖了更丰富的类别与场景。这一扩展将增强模型在更广泛场景下的适用性,并为未来SOD研究提供更全面的基础数据集。此外,我们提出了一种基于动态上采样的边缘解码器,该解码器在逐步恢复图像特征分辨率的同时聚焦于目标边缘。在五个基准数据集上的综合实验表明,我们的方法显著优于当前最先进的技术,并且超越了多种现有的全监督SOD方法。代码与实验结果将予以公开。