This work focuses on multi-shot semi-supervised video object segmentation (MVOS), which aims at segmenting the target object indicated by an initial mask throughout a video with multiple shots. The existing VOS methods mainly focus on single-shot videos and struggle with shot discontinuities, thereby limiting their real-world applicability. We propose a transition mimicking data augmentation strategy (TMA) which enables cross-shot generalization with single-shot data to alleviate the severe annotated multi-shot data sparsity, and the Segment Anything Across Shots (SAAS) model, which can detect and comprehend shot transitions effectively. To support evaluation and future study in MVOS, we introduce Cut-VOS, a new MVOS benchmark with dense mask annotations, diverse object categories, and high-frequency transitions. Extensive experiments on YouMVOS and Cut-VOS demonstrate that the proposed SAAS achieves state-of-the-art performance by effectively mimicking, understanding, and segmenting across complex transitions. The code and datasets are released at https://henghuiding.com/SAAS/.


翻译:本研究聚焦于多镜头半监督视频对象分割(MVOS),其目标是通过初始掩码指示的目标对象,在包含多个镜头的视频中进行全程分割。现有的VOS方法主要针对单镜头视频,难以处理镜头间的不连续性,从而限制了其在实际场景中的应用。我们提出了一种过渡模拟数据增强策略(TMA),利用单镜头数据实现跨镜头泛化,以缓解标注多镜头数据严重稀疏的问题;同时提出了跨镜头分割任意对象(SAAS)模型,该模型能有效检测并理解镜头转换。为支持MVOS的评估与未来研究,我们引入了Cut-VOS——一个新的MVOS基准数据集,具有密集的掩码标注、多样化的对象类别和高频率的镜头转换。在YouMVOS和Cut-VOS上的大量实验表明,所提出的SAAS模型通过有效模拟、理解并分割复杂转换,实现了最先进的性能。代码与数据集发布于https://henghuiding.com/SAAS/。

0
下载
关闭预览
VIP会员
最新内容
【CMU博士论文】物理世界的视觉感知与深度理解
专知会员服务
0+阅读 · 19分钟前
伊朗战争停火期间美军关键弹药状况分析
专知会员服务
5+阅读 · 今天11:13
电子战革命:塑造战场的十年突破(2015–2025)
专知会员服务
4+阅读 · 今天9:19
人工智能即服务与未来战争(印度视角)
专知会员服务
2+阅读 · 今天7:57
《美国战争部2027财年军事人员预算》
专知会员服务
2+阅读 · 今天7:44
伊朗战争中的电子战
专知会员服务
5+阅读 · 今天7:04
大语言模型平台在国防情报应用中的对比
专知会员服务
8+阅读 · 今天3:12
Top
微信扫码咨询专知VIP会员