Detecting glass regions is a challenging task due to the inherent ambiguity in their transparency and reflective characteristics. Current solutions in this field remain rooted in conventional deep learning paradigms, requiring the construction of annotated datasets and the design of network architectures. However, the evident drawback with these mainstream solutions lies in the time-consuming and labor-intensive process of curating datasets, alongside the increasing complexity of model structures. In this paper, we propose to address these issues by fully harnessing the capabilities of two existing vision foundation models (VFMs): Stable Diffusion and Segment Anything Model (SAM). Firstly, we construct a Synthetic but photorealistic large-scale Glass Surface Detection dataset, dubbed S-GSD, without any labour cost via Stable Diffusion. This dataset consists of four different scales, consisting of 168k images totally with precise masks. Besides, based on the powerful segmentation ability of SAM, we devise a simple Glass surface sEgMentor named GEM, which follows the simple query-based encoder-decoder architecture. Comprehensive experiments are conducted on the large-scale glass segmentation dataset GSD-S. Our GEM establishes a new state-of-the-art performance with the help of these two VFMs, surpassing the best-reported method GlassSemNet with an IoU improvement of 2.1%. Additionally, extensive experiments demonstrate that our synthetic dataset S-GSD exhibits remarkable performance in zero-shot and transfer learning settings. Codes, datasets and models are publicly available at: https://github.com/isbrycee/GEM
翻译:检测玻璃区域因其透明和反射特性的固有模糊性而具有挑战性。当前该领域的解决方案仍植根于传统深度学习范式,需要构建标注数据集并设计网络架构。然而,这些主流方案的明显缺陷在于数据集构建过程耗时费力,同时模型结构日益复杂。本文提出通过充分利用两个现有视觉基础模型(VFM)——Stable Diffusion和分割一切模型(SAM)——来解决这些问题。首先,我们利用Stable Diffusion零成本构建了一个名为S-GSD的大规模合成但逼真的玻璃表面检测数据集。该数据集包含四个不同尺度,总计16.8万张图像及精确掩码。此外,基于SAM强大的分割能力,我们设计了一种名为GEM的简单玻璃表面分割器,它采用基于查询的简单编码器-解码器架构。在大规模玻璃分割数据集GSD-S上进行了全面实验。借助这两个VFM,我们的GEM实现了新的最先进性能,以IoU提升2.1%超越了最佳报道方法GlassSemNet。此外,大量实验表明,我们的合成数据集S-GSD在零样本和迁移学习设置中表现出显著性能。代码、数据集和模型已公开于:https://github.com/isbrycee/GEM