Large motion poses a critical challenge in Video Frame Interpolation (VFI) task. Existing methods are often constrained by limited receptive fields, resulting in sub-optimal performance when handling scenarios with large motion. In this paper, we introduce a new pipeline for VFI, which can effectively integrate global-level information to alleviate issues associated with large motion. Specifically, we first estimate a pair of initial intermediate flows using a high-resolution feature map for extracting local details. Then, we incorporate a sparse global matching branch to compensate for flow estimation, which consists of identifying flaws in initial flows and generating sparse flow compensation with a global receptive field. Finally, we adaptively merge the initial flow estimation with global flow compensation, yielding a more accurate intermediate flow. To evaluate the effectiveness of our method in handling large motion, we carefully curate a more challenging subset from commonly used benchmarks. Our method demonstrates the state-of-the-art performance on these VFI subsets with large motion.
翻译:大运动是视频帧插值任务中的关键挑战。现有方法常受限于有限的感受野,在处理大运动场景时表现欠佳。本文提出一种新的视频帧插值流程,能有效整合全局信息以缓解大运动相关问题。具体而言,我们首先利用高分辨率特征图估计一对初始中间光流以提取局部细节;随后引入稀疏全局匹配分支对光流估计进行补偿,该分支通过识别初始光流缺陷并生成具有全局感受野的稀疏光流补偿来实现;最后自适应融合初始光流估计与全局光流补偿,从而获得更精确的中间光流。为评估方法处理大运动的有效性,我们从常用基准数据集中精心筛选出更具挑战性的子集。实验表明,本方法在这些大运动视频帧插值子集上取得了最先进的性能。