Unsupervised image segmentation is a critical task in computer vision. It enables dense scene understanding without human annotations, which is especially valuable in domains where labelled data is scarce. However, existing methods often struggle to reconcile global semantic structure with fine-grained boundary accuracy. This paper introduces DynaGuide, an adaptive segmentation framework that addresses these challenges through a novel dual-guidance strategy and dynamic loss optimization. Building on our previous work, DynaSeg, DynaGuide combines global pseudo-labels from zero-shot models such as DiffSeg or SegFormer with local boundary refinement using a lightweight CNN trained from scratch. This synergy allows the model to correct coarse or noisy global predictions and produce high-precision segmentations. At the heart of DynaGuide is a multi-component loss that dynamically balances feature similarity, Huber-smoothed spatial continuity, including diagonal relationships, and semantic alignment with the global pseudo-labels. Unlike prior approaches, DynaGuide trains entirely without ground-truth labels in the target domain and supports plug-and-play integration of diverse guidance sources. Extensive experiments on BSD500, PASCAL VOC2012, and COCO demonstrate that DynaGuide achieves state-of-the-art performance, improving mIoU by 17.5% on BSD500, 3.1% on PASCAL VOC2012, and 11.66% on COCO. With its modular design, strong generalization, and minimal computational footprint, DynaGuide offers a scalable and practical solution for unsupervised segmentation in real-world settings. Code available at: https://github.com/RyersonMultimediaLab/DynaGuide
翻译:无监督图像分割是计算机视觉中的一项关键任务。它能够在无需人工标注的情况下实现密集场景理解,这在标注数据稀缺的领域尤其有价值。然而,现有方法往往难以在全局语义结构与细粒度边界精度之间取得平衡。本文提出了DynaGuide,一种自适应分割框架,通过新颖的双重引导策略和动态损失优化来解决这些挑战。基于我们之前的工作DynaSeg,DynaGuide将来自零样本模型(如DiffSeg或SegFormer)的全局伪标签与使用从头训练的轻量级CNN进行的局部边界细化相结合。这种协同作用使模型能够纠正粗糙或有噪声的全局预测,并产生高精度分割结果。DynaGuide的核心是一个多组件损失函数,它动态地平衡了特征相似性、经过Huber平滑的空间连续性(包括对角线关系)以及与全局伪标签的语义对齐。与先前方法不同,DynaGuide完全无需在目标领域使用真实标签进行训练,并支持多种引导源的即插即用集成。在BSD500、PASCAL VOC2012和COCO数据集上进行的大量实验表明,DynaGuide实现了最先进的性能,在BSD500上mIoU提升了17.5%,在PASCAL VOC2012上提升了3.1%,在COCO上提升了11.66%。凭借其模块化设计、强大的泛化能力和最小的计算开销,DynaGuide为现实场景中的无监督分割提供了一个可扩展且实用的解决方案。代码位于:https://github.com/RyersonMultimediaLab/DynaGuide