Segment Anything 3 (SAM3) has established a powerful foundation that robustly detects, segments, and tracks specified targets in videos. However, in its original implementation, its group-level collective memory selection is suboptimal for complex multi-object scenarios, as it employs a synchronized decision across all concurrent targets conditioned on their average performance, often overlooking individual reliability. To this end, we propose SAM3-DMS, a training-free decoupled strategy that utilizes fine-grained memory selection on individual objects. Experiments demonstrate that our approach achieves robust identity preservation and tracking stability. Notably, our advantage becomes more pronounced with increased target density, establishing a solid foundation for simultaneous multi-target video segmentation in the wild.
翻译:Segment Anything 3 (SAM3) 建立了一个强大的基础,能够稳健地检测、分割并跟踪视频中的指定目标。然而,在其原始实现中,其组级集体记忆选择策略对于复杂多目标场景并非最优,因为它采用了一种基于所有并发目标平均性能的同步决策,常常忽视了单个目标的可靠性。为此,我们提出了SAM3-DMS,这是一种无需训练的解耦策略,它在单个对象上利用细粒度的记忆选择。实验表明,我们的方法实现了稳健的身份保持和跟踪稳定性。值得注意的是,随着目标密度的增加,我们的优势变得更加明显,为野外环境下的同步多目标视频分割奠定了坚实的基础。