Segmentation of objects in a video is challenging due to the nuances such as motion blurring, parallax, occlusions, changes in illumination, etc. Instead of addressing these nuances separately, we focus on building a generalizable solution that avoids overfitting to the individual intricacies. Such a solution would also help us save enormous resources involved in human annotation of video corpora. To solve Video Object Segmentation (VOS) in an unsupervised setting, we propose a new pipeline (FODVid) based on the idea of guiding segmentation outputs using flow-guided graph-cut and temporal consistency. Basically, we design a segmentation model incorporating intra-frame appearance and flow similarities, and inter-frame temporal continuation of the objects under consideration. We perform an extensive experimental analysis of our straightforward methodology on the standard DAVIS16 video benchmark. Though simple, our approach produces results comparable (within a range of ~2 mIoU) to the existing top approaches in unsupervised VOS. The simplicity and effectiveness of our technique opens up new avenues for research in the video domain.
翻译:视频中目标的分割因运动模糊、视差、遮挡、光照变化等细微问题而具有挑战性。我们并未逐一解决这些细微问题,而是致力于构建一个能够避免过度拟合个体复杂性的泛化解决方案。这类方案还将有助于节省视频语料库人工标注所涉及的大量资源。为解决无监督环境下的视频目标分割(VOS),我们提出了一种基于流引导图割与时间一致性来指导分割输出的新流程(FODVid)。具体而言,我们设计了一个融合帧内外观与流相似性、以及帧间目标持续性的分割模型。我们在标准DAVIS16视频基准上对所提直接方法进行了广泛的实验分析。尽管方法简单,但我们的结果与现有无监督VOS顶尖方法相当(平均交并比误差范围约2)。该技术的简洁性与有效性为视频领域研究开辟了新途径。