IDO-VFI: Identifying Dynamics via Optical Flow Guidance for Video Frame Interpolation with Events

Video frame interpolation (VFI) increases the video frame rate by inserting a reconstruction frame into two consecutive frames. Due to the limitation of the fixed frame rate of ordinary camera, the frame-only video frame interpolation methods inevitably lose the dynamics in the interval between consecutive frames. In order to compensate for the lack of inter-frame information, motion models are often used, but those models cannot account for the real motions. Event cameras are bio-inspired vision sensor, each pixel of which independently perceives and encodes relative changes in light intensity. Event cameras output sparse, asynchronous streams of events instead of frames, with advantages of high temporal resolution, high dynamics, and low power consumption. An event is usually expressed as a tuple e=(x,y,p,t), which means that at timestamp t, an event with polarity is generated at the pixel (x,y). Positive polarity indicates that the change of light intensity from week to strong is beyond the threshold, while negative polarity is just the opposite. Because an event camera has high temporal resolution up to microseconds, it can capture complete changes or motion between frames. The event flow is the embodiment of inter-frame changes. Therefore, the optical flow estimated from the events does not require any motion model to be fitted, which can be inherently nonlinear. Since events lack intensity information, frame-based optical flow is complementary to event-based optical flow. By combining these two kinds of optical flow, more accurate estimation results can be obtained. Meanwhile, it is possible to reconstruct high-quality keyframes at any timestamp, since real inter-frame dynamics are captured.

翻译：视频帧插值（VFI）通过在两个连续帧之间插入重建帧来提高视频帧率。由于普通相机固定帧率的限制，纯帧视频帧插值方法不可避免地丢失了连续帧之间间隔内的动态信息。为弥补帧间信息的缺失，通常采用运动模型，但这些模型无法表征真实运动。事件相机是一种仿生视觉传感器，其每个像素独立感知并编码光强的相对变化。事件相机输出稀疏、异步的事件流而非帧，具有高时间分辨率、高动态范围和低功耗等优势。一个事件通常表示为元组 e=(x,y,p,t)，其中在时间戳 t 时刻，像素 (x,y) 处生成带有极性 p 的事件。正极性表示光强由弱变强且变化超过阈值，负极性则相反。由于事件相机时间分辨率可达微秒级，它能捕获帧间的完整变化或运动。事件流是帧间变化的体现，因此从事件中估计的光流无需拟合任何运动模型，可天然具有非线性特性。由于事件缺乏强度信息，基于帧的光流与基于事件的光流具有互补性。结合这两类光流可获得更精确的估计结果。同时，由于捕获了真实的帧间动态，可在任意时间戳重建高质量的关键帧。