Video frame interpolation aims to generate high-quality intermediate frames from boundary frames and increase frame rate. While existing linear, symmetric and nonlinear models are used to bridge the gap from the lack of inter-frame motion, they cannot reconstruct real motions. Event cameras, however, are ideal for capturing inter-frame dynamics with their extremely high temporal resolution. In this paper, we propose an event-and-frame-based video frame interpolation method named IDO-VFI that assigns varying amounts of computation for different sub-regions via optical flow guidance. The proposed method first estimates the optical flow based on frames and events, and then decides whether to further calculate the residual optical flow in those sub-regions via a Gumbel gating module according to the optical flow amplitude. Intermediate frames are eventually generated through a concise Transformer-based fusion network. Our proposed method maintains high-quality performance while reducing computation time and computational effort by 10% and 17% respectively on Vimeo90K datasets, compared with a unified process on the whole region. Moreover, our method outperforms state-of-the-art frame-only and frames-plus-events methods on multiple video frame interpolation benchmarks. Codes and models are available at https://github.com/shicy17/IDO-VFI.
翻译:视频帧插值旨在通过边界帧生成高质量中间帧并提升帧率。现有线性、对称及非线性模型虽可弥补帧间运动信息的缺失,但无法重建真实运动。事件相机凭借其超高时间分辨率,能理想捕捉帧间动态。本文提出基于事件与帧的视频帧插值方法IDO-VFI,通过光流引导为不同子区域分配差异化计算量。该方法首先基于帧与事件估计光流,随后通过Gumbel门控模块依据光流幅值判断是否需进一步计算子区域的残差光流。最终通过简洁的基于Transformer的融合网络生成中间帧。相较于全区域统一处理方式,本方法在Vimeo90K数据集上保持高质量性能的同时,将计算时间与计算量分别降低10%和17%。此外,本方法在多个视频帧插值基准测试中优于当前最优的纯帧方法及帧-事件混合方法。代码与模型已开源至https://github.com/shicy17/IDO-VFI。