Recent state-of-the-art semi-supervised Video Object Segmentation (VOS) methods have shown significant improvements in target object segmentation accuracy when information from preceding frames is used in segmenting the current frame. In particular, such memory-based approaches can help a model to more effectively handle appearance changes (representation drift) or occlusions. Ideally, for maximum performance, Online VOS methods would need all or most of the preceding frames (or their extracted information) to be stored in memory and be used for online learning in later frames. Such a solution is not feasible for long videos, as the required memory size grows without bound, and such methods can fail when memory is limited and a target object experiences repeated representation drifts throughout a video. We propose two novel techniques to reduce the memory requirement of Online VOS methods while improving modeling accuracy and generalization on long videos. Motivated by the success of continual learning techniques in preserving previously-learned knowledge, here we propose Gated-Regularizer Continual Learning (GRCL), which improves the performance of any Online VOS subject to limited memory, and a Reconstruction-based Memory Selection Continual Learning (RMSCL), which empowers Online VOS methods to efficiently benefit from stored information in memory. We also analyze the performance of a hybrid combination of the two proposed methods. Experimental results show that the proposed methods are able to improve the performance of Online VOS models by more than 8%, with improved robustness on long-video datasets while maintaining comparable performance on short-video datasets such as DAVIS16, DAVIS17, and YouTube-VOS18.
翻译:最近在具有半监督性质的视频目标分割(VOS)领域,最先进的方法通过利用前一帧的信息分割当前帧,显著提升了目标物体分割的准确性。具体而言,这类基于记忆的方法能更有效地应对外观变化(表征漂移)或遮挡问题。理论上,为获得最优性能,在线VOS方法需将所有或大部分先前帧(或其提取的信息)存储于记忆中,并在后续帧中用于在线学习。然而,此类方案对长视频并不可行,因为所需记忆容量会无限制增长;当记忆受限且目标物体在视频中反复经历表征漂移时,这类方法可能失效。我们提出两种新技术,旨在降低在线VOS方法的记忆需求,同时提升其在长视频上的建模精度与泛化能力。受持续学习技术在保留先前知识方面取得的成功启发,我们提出门控正则化持续学习(GRCL),该技术能改善任何受限于记忆的在线VOS方法的性能;同时提出基于重构的记忆选择持续学习(RMSCL),使在线VOS方法能高效利用记忆中的存储信息。我们还分析了两种方法混合组合的性能。实验结果表明,所提方法能将在线VOS模型的性能提升超过8%,在长视频数据集上增强鲁棒性,同时在DAVIS16、DAVIS17和YouTube-VOS18等短视频数据集上保持相当的性能。