Deep learning based video frame interpolation (VIF) method, aiming to synthesis the intermediate frames to enhance video quality, have been highly developed in the past few years. This paper investigates the adversarial robustness of VIF models. We apply adversarial attacks to VIF models and find that the VIF models are very vulnerable to adversarial examples. To improve attack efficiency, we suggest to make full use of the property of video frame interpolation task. The intuition is that the gap between adjacent frames would be small, leading to the corresponding adversarial perturbations being similar as well. Then we propose a novel attack method named Inter-frame Accelerate Attack (IAA) that initializes the perturbation as the perturbation for the previous adjacent frame and reduces the number of attack iterations. It is shown that our method can improve attack efficiency greatly while achieving comparable attack performance with traditional methods. Besides, we also extend our method to video recognition models which are higher level vision tasks and achieves great attack efficiency.
翻译:基于深度学习的视频帧插值方法旨在合成中间帧以提升视频质量,在过去几年中得到了高度发展。本文研究了视频帧插值模型的对抗鲁棒性。我们将对抗攻击应用于VIF模型,发现VIF模型极易受到对抗样本的影响。为提高攻击效率,我们建议充分利用视频帧插值任务的特点:相邻帧之间的差距较小,相应的对抗扰动也具有相似性。据此,我们提出一种名为帧间加速攻击(IAA)的新颖攻击方法,该方法将前一个相邻帧的扰动作为初始扰动,并减少攻击迭代次数。实验表明,我们的方法在取得与传统方法相当的攻击性能的同时,能够显著提升攻击效率。此外,我们还将该方法扩展到更高层次的视觉任务——视频识别模型中,同样取得了优异的攻击效率。