Video frame interpolation is an important low-level vision task, which can increase frame rate for more fluent visual experience. Existing methods have achieved great success by employing advanced motion models and synthesis networks. However, the spatial redundancy when synthesizing the target frame has not been fully explored, that can result in lots of inefficient computation. On the other hand, the computation compression degree in frame interpolation is highly dependent on both texture distribution and scene motion, which demands to understand the spatial-temporal information of each input frame pair for a better compression degree selection. In this work, we propose a novel two-stage frame interpolation framework termed WaveletVFI to address above problems. It first estimates intermediate optical flow with a lightweight motion perception network, and then a wavelet synthesis network uses flow aligned context features to predict multi-scale wavelet coefficients with sparse convolution for efficient target frame reconstruction, where the sparse valid masks that control computation in each scale are determined by a crucial threshold ratio. Instead of setting a fixed value like previous methods, we find that embedding a classifier in the motion perception network to learn a dynamic threshold for each sample can achieve more computation reduction with almost no loss of accuracy. On the common high resolution and animation frame interpolation benchmarks, proposed WaveletVFI can reduce computation up to 40% while maintaining similar accuracy, making it perform more efficiently against other state-of-the-arts. Code is available at https://github.com/ltkong218/WaveletVFI.
翻译:视频帧插值是一项重要的底层视觉任务,能够提升帧率以获得更流畅的视觉体验。现有方法通过采用先进的运动模型和合成网络已取得巨大成功,但在合成目标帧时尚未充分探索空间冗余问题,这可能导致大量低效计算。另一方面,帧插值中的计算压缩程度高度依赖于纹理分布和场景运动,这要求理解每对输入帧的时空信息以选择更优的压缩程度。本文提出一种新颖的两阶段帧插值框架WaveletVFI来解决上述问题。该框架首先通过轻量级运动感知网络估计中间光流,随后小波合成网络利用流对齐的上下文特征,结合稀疏卷积预测多尺度小波系数以实现高效的目标帧重建,其中控制各尺度计算的稀疏有效掩膜由关键阈值比率决定。不同于先前方法设置固定值,我们发现通过在运动感知网络中嵌入分类器,为每个样本学习动态阈值,可在几乎不损失精度的情况下实现更高的计算量降低。在常见的高分辨率及动画帧插值基准上,所提出的WaveletVFI能在保持相近精度的同时将计算量降低多达40%,相较于其他最先进方法展现出更高效率。代码开源地址:https://github.com/ltkong218/WaveletVFI。