Video Frame Interpolation (VFI) aims to generate intermediate video frames between consecutive input frames. Since the event cameras are bio-inspired sensors that only encode brightness changes with a micro-second temporal resolution, several works utilized the event camera to enhance the performance of VFI. However, existing methods estimate bidirectional inter-frame motion fields with only events or approximations, which can not consider the complex motion in real-world scenarios. In this paper, we propose a novel event-based VFI framework with cross-modal asymmetric bidirectional motion field estimation. In detail, our EIF-BiOFNet utilizes each valuable characteristic of the events and images for direct estimation of inter-frame motion fields without any approximation methods. Moreover, we develop an interactive attention-based frame synthesis network to efficiently leverage the complementary warping-based and synthesis-based features. Finally, we build a large-scale event-based VFI dataset, ERF-X170FPS, with a high frame rate, extreme motion, and dynamic textures to overcome the limitations of previous event-based VFI datasets. Extensive experimental results validate that our method shows significant performance improvement over the state-of-the-art VFI methods on various datasets. Our project pages are available at: https://github.com/intelpro/CBMNet
翻译:视频帧插值(VFI)旨在生成连续输入帧之间的中间视频帧。由于事件相机是受生物启发的传感器,仅以微秒级时间分辨率编码亮度变化,已有若干研究利用事件相机来提升VFI性能。然而,现有方法仅通过事件或近似方式估计双向帧间运动场,无法充分考虑真实场景中的复杂运动。本文提出一种新颖的基于跨模态非对称双向运动场估计的事件VFI框架。具体而言,我们的EIF-BiOFNet充分利用事件与图像各自的特征价值,无需任何近似方法即可直接估计帧间运动场。此外,我们开发了基于交互式注意力的帧合成网络,以高效融合基于形变与基于合成的互补特征。最后,我们构建了大规模事件VFI数据集ERF-X170FPS,其具备高帧率、极端运动与动态纹理特性,以克服现有事件VFI数据集的局限性。大量实验结果表明,本方法在多个数据集上相较最先进的VFI方法均展现出显著的性能提升。项目页面详见:https://github.com/intelpro/CBMNet