We present Markov Map Nearest Neighbor V2 (M2N2V2), a novel and simple, yet effective approach which leverages depth guidance and attention maps for unsupervised and training-free point-prompt-based interactive segmentation. Following recent trends in supervised multimodal approaches, we carefully integrate depth as an additional modality to create novel depth-guided Markov-maps. Furthermore, we observe occasional segment size fluctuations in M2N2 during the interactive process, which can decrease the overall mIoU's. To mitigate this problem, we model the prompting as a sequential process and propose a novel adaptive score function which considers the previous segmentation and the current prompt point in order to prevent unreasonable segment size changes. Using Stable Diffusion 2 and Depth Anything V2 as backbones, we empirically show that our proposed M2N2V2 significantly improves the Number of Clicks (NoC) and mIoU compared to M2N2 in all datasets except those from the medical domain. Interestingly, our unsupervised approach achieves competitive results compared to supervised methods like SAM and SimpleClick in the more challenging DAVIS and HQSeg44K datasets in the NoC metric, reducing the gap between supervised and unsupervised methods.
翻译:我们提出了马尔可夫映射最近邻V2(M2N2V2),这是一种新颖、简单而有效的方法,它利用深度引导和注意力图实现基于点提示的无监督免训练交互式分割。遵循近期有监督多模态方法的趋势,我们精心将深度作为额外模态进行整合,以创建新颖的深度引导马尔可夫映射。此外,我们观察到M2N2在交互过程中偶尔会出现分割区域尺寸波动,这可能降低整体平均交并比(mIoU)。为缓解此问题,我们将提示过程建模为一个序列过程,并提出了一种新颖的自适应评分函数,该函数同时考虑先前的分割结果和当前提示点,以防止不合理的分割尺寸变化。以Stable Diffusion 2和Depth Anything V2为骨干网络,我们通过实验证明,除医学领域数据集外,我们提出的M2N2V2在所有数据集上的点击次数(NoC)和mIoU均较M2N2有显著提升。值得注意的是,在更具挑战性的DAVIS和HQSeg44K数据集的NoC指标上,我们的无监督方法取得了与SAM、SimpleClick等有监督方法相竞争的结果,从而缩小了有监督与无监督方法之间的差距。