基础模型时代的图像分割研究综述 (Image Segmentation in Foundation Model Era: A Survey)

Image segmentation is a long-standing challenge in computer vision, studied continuously over several decades, as evidenced by seminal algorithms such as N-Cut, FCN, and MaskFormer. With the advent of foundation models (FMs), contemporary segmentation methodologies have embarked on a new epoch by either adapting FMs (e.g., CLIP, Stable Diffusion, DINO) for image segmentation or developing dedicated segmentation foundation models (e.g., SAM). These approaches not only deliver superior segmentation performance, but also herald newfound segmentation capabilities previously unseen in deep learning context. However, current research in image segmentation lacks a detailed analysis of distinct characteristics, challenges, and solutions associated with these advancements. This survey seeks to fill this gap by providing a thorough review of cutting-edge research centered around FM-driven image segmentation. We investigate two basic lines of research -- generic image segmentation (i.e., semantic segmentation, instance segmentation, panoptic segmentation), and promptable image segmentation (i.e., interactive segmentation, referring segmentation, few-shot segmentation) -- by delineating their respective task settings, background concepts, and key challenges. Furthermore, we provide insights into the emergence of segmentation knowledge from FMs like CLIP, Stable Diffusion, and DINO. An exhaustive overview of over 300 segmentation approaches is provided to encapsulate the breadth of current research efforts. Subsequently, we engage in a discussion of open issues and potential avenues for future research. We envisage that this fresh, comprehensive, and systematic survey catalyzes the evolution of advanced image segmentation systems.

翻译：图像分割作为计算机视觉领域一个长期存在的挑战，已持续研究数十年，诸如N-Cut、FCN和MaskFormer等开创性算法即为明证。随着基础模型（FMs）的出现，当代分割方法通过适配现有基础模型（如CLIP、Stable Diffusion、DINO）用于图像分割，或开发专用的分割基础模型（如SAM），已迈入新的纪元。这些方法不仅提供了卓越的分割性能，更预示着深度学习背景下前所未有的新型分割能力。然而，当前图像分割研究缺乏对这些进展所关联的独特特征、挑战及解决方案的详细分析。本综述旨在填补这一空白，对围绕基础模型驱动的图像分割的前沿研究进行全面回顾。我们通过阐明其各自的任务设定、背景概念和关键挑战，研究了两条基本路线——通用图像分割（即语义分割、实例分割、全景分割）和可提示图像分割（即交互式分割、指代分割、少样本分割）。此外，我们深入探讨了从CLIP、Stable Diffusion和DINO等基础模型中涌现的分割知识。本文提供了对超过300种分割方法的详尽概览，以囊括当前研究工作的广度。随后，我们讨论了开放性问题及未来研究的潜在方向。我们预见，这份新颖、全面且系统的综述将推动先进图像分割系统的发展。