The video frame interpolation (VFI) model applies the convolution operation to all locations, leading to redundant computations in regions with easy motion. We can use dynamic spatial pruning method to skip redundant computation, but this method cannot properly identify easy regions in VFI tasks without supervision. In this paper, we develop an Uncertainty-Guided Spatial Pruning (UGSP) architecture to skip redundant computation for efficient frame interpolation dynamically. Specifically, pixels with low uncertainty indicate easy regions, where the calculation can be reduced without bringing undesirable visual results. Therefore, we utilize uncertainty-generated mask labels to guide our UGSP in properly locating the easy region. Furthermore, we propose a self-contrast training strategy that leverages an auxiliary non-pruning branch to improve the performance of our UGSP. Extensive experiments show that UGSP maintains performance but reduces FLOPs by 34%/52%/30% compared to baseline without pruning on Vimeo90K/UCF101/MiddleBury datasets. In addition, our method achieves state-of-the-art performance with lower FLOPs on multiple benchmarks.
翻译:视频帧插值模型对所有位置应用卷积操作,导致在运动简单区域产生冗余计算。虽然可以采用动态空间剪枝方法跳过冗余计算,但该方法无法在无监督条件下正确识别帧插值任务中的简单区域。本文提出一种不确定性引导的空间剪枝架构,用于动态跳过冗余计算以实现高效帧插值。具体而言,低不确定性像素对应运动简单区域,在该区域减少计算不会产生不良视觉效果。因此,我们利用不确定性生成的掩膜标签来指导UGSP准确定位简单区域。此外,我们提出自对比训练策略,通过辅助非剪枝分支提升UGSP性能。大量实验表明,与基准模型相比,UGSP在Vimeo90K/UCF101/MiddleBury数据集上分别降低34%/52%/30%的浮点运算量且保持性能不变。同时,本方法在多个基准测试中以更低计算量达到最优性能。