Video instance segmentation requires detecting, segmenting, and tracking objects in videos, typically relying on costly video annotations. This paper introduces a method that eliminates video annotations by utilizing image datasets. The PM-VIS algorithm is adapted to handle both bounding box and instance-level pixel annotations dynamically. We introduce ImageNet-bbox to supplement missing categories in video datasets and propose the PM-VIS+ algorithm to adjust supervision based on annotation types. To enhance accuracy, we use pseudo masks and semi-supervised optimization techniques on unannotated video data. This method achieves high video instance segmentation performance without manual video annotations, offering a cost-effective solution and new perspectives for video instance segmentation applications. The code will be available in https://github.com/ldknight/PM-VIS-plus
翻译:视频实例分割任务要求对视频中的目标进行检测、分割与跟踪,传统方法通常依赖成本高昂的视频标注数据。本文提出一种利用图像数据集完全避免视频标注的方法。通过改进PM-VIS算法,使其能够动态处理边界框标注与实例级像素标注。我们引入ImageNet-bbox数据集以补充视频数据集中缺失的类别,并提出PM-VIS+算法根据标注类型自适应调整监督信号。为提升精度,我们在未标注视频数据上采用伪掩码与半监督优化技术。该方法在无需人工视频标注的情况下实现了高性能的视频实例分割,为相关应用提供了经济高效的解决方案与新思路。代码将在https://github.com/ldknight/PM-VIS-plus 公开。