Deep convolutional neural networks (CNNs) based approaches have achieved great performance in video matting. Many of these methods can produce accurate alpha estimation for the target body but typically yield fuzzy or incorrect target edges. This is usually caused by the following reasons: 1) The current methods always treat the target body and edge indiscriminately; 2) Target body dominates the whole target with only a tiny proportion target edge. For the first problem, we propose a CNN-based module that separately optimizes the matting target body and edge (SOBE). And on this basis, we introduce a real-time, trimap-free video matting method via progressively optimizing the matting target body and edge (POBEVM) that is much lighter than previous approaches and achieves significant improvements in the predicted target edge. For the second problem, we propose an Edge-L1-Loss (ELL) function that enforces our network on the matting target edge. Experiments demonstrate our method outperforms prior trimap-free matting methods on both Distinctions-646 (D646) and VideoMatte240K(VM) dataset, especially in edge optimization.
翻译:基于深度卷积神经网络(CNNs)的方法在视频抠图领域取得了卓越性能。许多方法虽能对目标主体生成精确的alpha估计,但通常会产生模糊或错误的目标边缘。这主要源于以下原因:1)现有方法往往对目标主体与边缘不加区分处理;2)目标主体占据整个目标的绝大部分,而边缘仅占极小比例。针对第一个问题,本文提出基于CNN的模块,用于分别优化抠图目标主体与边缘(SOBE)。在此基础上,我们引入一种无需三元图的实时视频抠图方法——通过渐进优化目标主体与边缘(POBEVM),该方法较之前方法更轻量化,并在预测目标边缘方面取得显著改进。针对第二个问题,我们提出边缘L1损失函数(ELL),迫使网络聚焦于抠图目标边缘的训练。实验表明,本方法在Distinctions-646(D646)和VideoMatte240K(VM)数据集上均优于现有无需三元图的抠图方法,尤其在边缘优化方面表现突出。