Bridge the Points: Graph-based Few-shot Segment Anything Semantically

The recent advancements in large-scale pre-training techniques have significantly enhanced the capabilities of vision foundation models, notably the Segment Anything Model (SAM), which can generate precise masks based on point and box prompts. Recent studies extend SAM to Few-shot Semantic Segmentation (FSS), focusing on prompt generation for SAM-based automatic semantic segmentation. However, these methods struggle with selecting suitable prompts, require specific hyperparameter settings for different scenarios, and experience prolonged one-shot inference times due to the overuse of SAM, resulting in low efficiency and limited automation ability. To address these issues, we propose a simple yet effective approach based on graph analysis. In particular, a Positive-Negative Alignment module dynamically selects the point prompts for generating masks, especially uncovering the potential of the background context as the negative reference. Another subsequent Point-Mask Clustering module aligns the granularity of masks and selected points as a directed graph, based on mask coverage over points. These points are then aggregated by decomposing the weakly connected components of the directed graph in an efficient manner, constructing distinct natural clusters. Finally, the positive and overshooting gating, benefiting from graph-based granularity alignment, aggregate high-confident masks and filter out the false-positive masks for final prediction, reducing the usage of additional hyperparameters and redundant mask generation. Extensive experimental analysis across standard FSS, One-shot Part Segmentation, and Cross Domain FSS datasets validate the effectiveness and efficiency of the proposed approach, surpassing state-of-the-art generalist models with a mIoU of 58.7% on COCO-20i and 35.2% on LVIS-92i. The code is available in https://andyzaq.github.io/GF-SAM/.

翻译：近期大规模预训练技术的进步显著增强了视觉基础模型的能力，特别是分割任意模型（SAM），该模型能够基于点和框提示生成精确掩码。现有研究将SAM扩展至少样本语义分割（FSS），重点关注基于SAM的自动语义分割中提示的生成。然而，这些方法在选取合适提示方面存在困难，需针对不同场景设置特定超参数，且由于过度使用SAM导致单次推理时间延长，效率低下且自动化能力有限。为解决这些问题，我们提出一种基于图分析的简洁而有效的方法。具体而言，正负对齐模块动态选择用于生成掩码的点提示，尤其发掘了背景上下文作为负参考的潜力。随后的点-掩码聚类模块基于掩码对点的覆盖情况，将掩码与选定点的粒度对齐为有向图。随后通过高效分解有向图的弱连通分量来聚合这些点，构建不同的自然聚类。最后，受益于基于图的粒度对齐的正向与过冲门控机制，聚合高置信度掩码并过滤假阳性掩码以生成最终预测，从而减少额外超参数的使用和冗余掩码的生成。在标准FSS、单次部件分割及跨域FSS数据集上的大量实验分析验证了所提方法的有效性和效率，在COCO-20i和LVIS-92i上分别以58.7%和35.2%的mIoU超越了当前最先进的通用模型。代码发布于https://andyzaq.github.io/GF-SAM/。