This paper presents a new deformable convolution-based video frame interpolation (VFI) method, using a coarse to fine 3D CNN to enhance the multi-flow prediction. This model first extracts spatio-temporal features at multiple scales using a 3D CNN, and estimates multi-flows using these features in a coarse-to-fine manner. The estimated multi-flows are then used to warp the original input frames as well as context maps, and the warped results are fused by a synthesis network to produce the final output. This VFI approach has been fully evaluated against 12 state-of-the-art VFI methods on three commonly used test databases. The results evidently show the effectiveness of the proposed method, which offers superior interpolation performance over other state of the art algorithms, with PSNR gains up to 0.19dB.
翻译:本文提出一种新的基于可变形卷积的视频帧插值方法,利用由粗到精的3D卷积神经网络增强多流预测。该模型首先通过3D CNN在多个尺度上提取时空特征,并基于这些特征以由粗到精的方式估计多流矢量。随后,估计的多流矢量用于对原始输入帧及上下文特征图进行形变扭曲,扭曲结果经由合成网络融合生成最终输出。该视频帧插值方法已在三个常用测试数据库上与12种最新视频帧插值方法进行全面评估。实验结果表明,该方法具有显著优越性,插值性能超越现有最优算法,PSNR增益最高达0.19dB。