The Associating Objects with Transformers (AOT) framework has exhibited exceptional performance in a wide range of complex scenarios for video object segmentation. In this study, we introduce MSDeAOT, a variant of the AOT series that incorporates transformers at multiple feature scales. Leveraging the hierarchical Gated Propagation Module (GPM), MSDeAOT efficiently propagates object masks from previous frames to the current frame using a feature scale with a stride of 16. Additionally, we employ GPM in a more refined feature scale with a stride of 8, leading to improved accuracy in detecting and tracking small objects. Through the implementation of test-time augmentations and model ensemble techniques, we achieve the top-ranking position in the EPIC-KITCHEN VISOR Semi-supervised Video Object Segmentation Challenge.
翻译:基于关联对象与Transformer框架(AOT)在视频目标分割的多种复杂场景中展现出卓越性能。本研究提出MSDeAOT——AOT系列的一种变体,该模型在多个特征尺度上引入Transformer结构。借助层次化门控传播模块(GPM),MSDeAOT利用步长为16的特征尺度,高效地将前一帧的目标掩码传播至当前帧。此外,我们在步长为8的精细化特征尺度上应用GPM,从而提升对小目标的检测与跟踪精度。通过实施测试时数据增强与模型集成技术,本方法在EPIC-KITCHEN VISOR半监督视频目标分割挑战赛中取得了最高排名。