Weakly supervised semantic segmentation (WSSS), aiming to train segmentation models solely using image-level labels, has received significant attention. Existing approaches mainly concentrate on creating high-quality pseudo labels by utilizing existing images and their corresponding image-level labels. However, the quality of pseudo labels degrades significantly when the size of available dataset is limited. Thus, in this paper, we tackle this problem from a different view by introducing a novel approach called GPT-Prompt Controlled Diffusion (GPCD) for data augmentation. This approach enhances the current labeled datasets by augmenting with a variety of images, achieved through controlled diffusion guided by GPT prompts. In this process, the existing images and image-level labels provide the necessary control information, where GPT is employed to enrich the prompts, leading to the generation of diverse backgrounds. Moreover, we integrate data source information as tokens into the Vision Transformer (ViT) framework. These tokens are specifically designed to improve the ability of downstream WSSS framework to recognize the origins of augmented images. Our proposed GPCD approach clearly surpasses existing state-of-the-art methods. This effect is more obvious when the amount of available data is small, demonstrating the effectiveness of our method.
翻译:弱监督语义分割(WSSS)旨在仅利用图像级标签训练分割模型,近年来受到广泛关注。现有方法主要集中于利用现有图像及其对应的图像级标签来生成高质量伪标签。然而,当可用数据集规模有限时,伪标签的质量会显著下降。为此,本文从不同角度解决该问题,提出一种名为GPT提示控制扩散(GPCD)的新型数据增强方法。该方法通过受GPT提示引导的受控扩散生成多样化图像,从而增强现有标注数据集。在此过程中,现有图像和图像级标签提供必要的控制信息,而GPT用于丰富提示内容,进而生成多样化的背景。此外,我们将数据源信息以令牌形式融入视觉Transformer(ViT)框架,这些令牌专门设计用于提升下游WSSS框架识别增强图像来源的能力。所提出的GPCD方法明显优于现有最优方法,且在可用数据量较小时效果更为显著,充分验证了方法的有效性。