Dynamic vision sensors or event cameras provide rich complementary information for video frame interpolation. Existing state-of-the-art methods follow the paradigm of combining both synthesis-based and warping networks. However, few of those methods fully respect the intrinsic characteristics of events streams. Given that event cameras only encode intensity changes and polarity rather than color intensities, estimating optical flow from events is arguably more difficult than from RGB information. We therefore propose to incorporate RGB information in an event-guided optical flow refinement strategy. Moreover, in light of the quasi-continuous nature of the time signals provided by event cameras, we propose a divide-and-conquer strategy in which event-based intermediate frame synthesis happens incrementally in multiple simplified stages rather than in a single, long stage. Extensive experiments on both synthetic and real-world datasets show that these modifications lead to more reliable and realistic intermediate frame results than previous video frame interpolation methods. Our findings underline that a careful consideration of event characteristics such as high temporal density and elevated noise benefits interpolation accuracy.
翻译:动态视觉传感器或事件相机为视频帧插值提供了丰富的补充信息。现有最先进方法遵循结合合成网络与扭曲网络的范式。然而,这些方法中鲜有充分尊重事件流内在特性的设计。由于事件相机仅编码亮度变化与极性而非颜色强度,从事件中估算光流相比从RGB信息中估算更具挑战性。因此,我们提出在事件引导的光流优化策略中融入RGB信息。此外,鉴于事件相机提供的时间信号具有准连续特性,我们提出一种分治策略,使基于事件的中间帧合成在多个简化阶段中增量进行,而非单一长阶段。在合成数据集与真实世界数据集上的大量实验表明,与先前视频帧插值方法相比,这些改进能生成更可靠、更真实的中间帧结果。我们的研究强调,充分考虑事件的高时间密度与高噪声特性等特征,有助于提升插值精度。