Complex video object segmentation serves as a fundamental task for a wide range of downstream applications such as video editing and automatic data annotation. Here we present the 2nd place solution in the MOSE track of PVUW 2024. To mitigate problems caused by tiny objects, similar objects and fast movements in MOSE. We use instance segmentation to generate extra pretraining data from the valid and test set of MOSE. The segmented instances are combined with objects extracted from COCO to augment the training data and enhance semantic representation of the baseline model. Besides, motion blur is added during training to increase robustness against image blur induced by motion. Finally, we apply test time augmentation (TTA) and memory strategy to the inference stage. Our method ranked 2nd in the MOSE track of PVUW 2024, with a $\mathcal{J}$ of 0.8007, a $\mathcal{F}$ of 0.8683 and a $\mathcal{J}$\&$\mathcal{F}$ of 0.8345.
翻译:复杂视频目标分割是视频编辑和自动数据标注等广泛下游应用的基础任务。本文介绍了PVUW 2024 MOSE赛道中获得的第二名解决方案。针对MOSE数据集中小目标、相似目标及快速运动引发的问题,我们利用实例分割从MOSE验证集和测试集中生成额外预训练数据。将分割后的实例与COCO中提取的目标相结合,以扩充训练数据并增强基线模型的语义表征能力。此外,训练过程中添加运动模糊处理,以提高模型对运动所致图像模糊的鲁棒性。最后,在推理阶段应用测试时增强(TTA)和记忆策略。该方法在PVUW 2024 MOSE赛道中取得第二名,其中$\mathcal{J}$为0.8007,$\mathcal{F}$为0.8683,$\mathcal{J}$\&$\mathcal{F}$为0.8345。