Unsupervised Video Object Segmentation (UVOS) refers to the challenging task of segmenting the prominent object in videos without manual guidance. In other words, the network detects the accurate region of the target object in a sequence of RGB frames without prior knowledge. In recent works, two approaches for UVOS have been discussed that can be divided into: appearance and appearance-motion based methods. Appearance based methods utilize the correlation information of inter-frames to capture target object that commonly appears in a sequence. However, these methods does not consider the motion of target object due to exploit the correlation information between randomly paired frames. Appearance-motion based methods, on the other hand, fuse the appearance features from RGB frames with the motion features from optical flow. Motion cue provides useful information since salient objects typically show distinctive motion in a sequence. However, these approaches have the limitation that the dependency on optical flow is dominant. In this paper, we propose a novel framework for UVOS that can address aforementioned limitations of two approaches in terms of both time and scale. Temporal Alignment Fusion aligns the saliency information of adjacent frames with the target frame to leverage the information of adjacent frames. Scale Alignment Decoder predicts the target object mask precisely by aggregating differently scaled feature maps via continuous mapping with implicit neural representation. We present experimental results on public benchmark datasets, DAVIS 2016 and FBMS, which demonstrate the effectiveness of our method. Furthermore, we outperform the state-of-the-art methods on DAVIS 2016.
翻译:无监督视频目标分割(UVOS)指在无人工标注条件下,从视频中分割显著目标的挑战性任务。具体而言,网络需在无先验知识的情况下,从RGB帧序列中准确检测目标对象区域。现有UVOS方法可分为两类:基于外观的方法和基于外观-运动的方法。前者利用帧间相关性信息捕获序列中普遍出现的共同目标,但因采用随机帧对进行相关性挖掘,未考虑目标运动特性;后者则融合RGB帧的外观特征与光流运动特征。由于显著目标通常呈现独特运动模式,运动线索可提供有效信息,但此类方法存在过度依赖光流的局限性。本文提出新型UVOS框架,从时间维度与尺度维度同时解决上述两类方法的局限:时间对齐融合模块通过将相邻帧的显著信息与目标帧对齐,充分利用帧间信息;尺度对齐解码器采用隐式神经表征的连续映射,聚合不同尺度的特征图,实现目标对象掩码的精确预测。在DAVIS 2016与FBMS公开基准数据集上的实验结果验证了本方法的有效性,且在DAVIS 2016数据集上超越了当前最优方法。