Recent state-of-the-art semi-supervised Video Object Segmentation (VOS) methods have shown significant improvements in target object segmentation accuracy when information from preceding frames is used in undertaking segmentation on the current frame. In particular, such memory-based approaches can help a model to more effectively handle appearance changes (representation drift) or occlusions. Ideally, for maximum performance, online VOS methods would need all or most of the preceding frames (or their extracted information) to be stored in memory and be used for online learning in consecutive frames. Such a solution is not feasible for long videos, as the required memory size would grow without bound. On the other hand, these methods can fail when memory is limited and a target object experiences repeated representation drifts throughout a video. We propose two novel techniques to reduce the memory requirement of online VOS methods while improving modeling accuracy and generalization on long videos. Motivated by the success of continual learning techniques in preserving previously-learned knowledge, here we propose Gated-Regularizer Continual Learning (GRCL), which improves the performance of any online VOS subject to limited memory, and a Reconstruction-based Memory Selection Continual Learning (RMSCL) which empowers online VOS methods to efficiently benefit from stored information in memory. Experimental results show that the proposed methods improve the performance of online VOS models up to 10 %, and boosts their robustness on long-video datasets while maintaining comparable performance on short-video datasets DAVIS16 and DAVIS17.
翻译:近期最先进的半监督视频目标分割(VOS)方法通过利用前一帧信息对当前帧进行分割,在目标对象分割精度上取得了显著提升。特别地,这类基于内存的方法能帮助模型更有效地应对外观变化(表征漂移)或遮挡。理想情况下,为达到最优性能,在线VOS方法需将所有或大部分前一帧(或其提取信息)存储在内存中,并用于连续帧的在线学习。然而,对于长视频而言,此类方案并不可行,因为所需内存容量会无限增长。另一方面,当内存受限且目标对象在整段视频中经历多次表征漂移时,这些方法可能失效。我们提出两种新颖技术,在提升长视频建模精度与泛化能力的同时,降低在线VOS方法的内存需求。受持续学习技术在保留先前习得知识方面成功应用的启发,我们提出门控正则化持续学习(GRCL),该方法能提升任何受限于内存的在线VOS方法的性能;同时提出基于重构内存选择的持续学习(RMSCL),使在线VOS方法能高效利用内存中存储的信息。实验结果表明,所提方法可将在线VOS模型的性能提升高达10%,并在长视频数据集上增强其鲁棒性,同时在短视频数据集DAVIS16和DAVIS17上保持相当性能。