Learning INR for Event-guided Rolling Shutter Frame Correction, Deblur, and Interpolation

Images captured by rolling shutter (RS) cameras under fast camera motion often contain obvious image distortions and blur, which can be modeled as a row-wise combination of a sequence of global shutter (GS) frames within the exposure time naturally, recovering high-frame-rate GS sharp frames from an RS blur image needs to simultaneously consider RS correction, deblur, and frame interpolation Taking this task is nontrivial, and to our knowledge, no feasible solutions exist by far. A naive way is to decompose the complete process into separate tasks and simply cascade existing methods; however, this results in cumulative errors and noticeable artifacts. Event cameras enjoy many advantages, e.g., high temporal resolution, making them potential for our problem. To this end, we make the first attempt to recover high-frame-rate sharp GS frames from an RS blur image and paired event data. Our key idea is to learn an implicit neural representation (INR) to directly map the position and time coordinates to RGB values to address the interlocking degradations in the image restoration process. Specifically, we introduce spatial-temporal implicit encoding (STE) to convert an RS blur image and events into a spatial-temporal representation (STR). To query a specific sharp frame (GS or RS), we embed the exposure time into STR and decode the embedded features to recover a sharp frame. Moreover, we propose an RS blur image-guided integral loss to better train the network. Our method is relatively lightweight as it contains only 0.379M parameters and demonstrates high efficiency as the STE is called only once for any number of interpolation frames. Extensive experiments show that our method significantly outperforms prior methods addressing only one or two of the tasks.

翻译：卷帘快门（RS）相机在快速相机运动下捕获的图像通常包含明显的图像畸变和模糊，这可以被自然建模为曝光时间内一系列全局快门（GS）帧的逐行组合。从RS模糊图像中恢复高帧率的GS清晰帧需要同时考虑RS校正、去模糊和帧插值。这一任务具有挑战性，据我们所知，目前尚无可行解决方案。一种简单的方法是将完整过程分解为独立任务并直接级联现有方法，但这会导致累积误差和明显伪影。事件相机具有高时间分辨率等优势，使其有可能解决我们的问题。为此，我们首次尝试从RS模糊图像及其配对事件数据中恢复高帧率清晰GS帧。我们的核心思想是学习隐式神经表示（INR），直接将位置和时间坐标映射到RGB值，以解决图像恢复过程中的相互耦合退化。具体而言，我们引入时空隐式编码（STE）将RS模糊图像和事件转换为时空表示（STR）。为查询特定清晰帧（GS或RS），我们将曝光时间嵌入STR中，并对嵌入特征进行解码以恢复清晰帧。此外，我们提出了一种RS模糊图像引导的积分损失函数以更好地训练网络。我们的方法相对轻量，仅包含0.379M参数，且具有高计算效率，因为无论插帧数量多少，STE只需调用一次。大量实验表明，我们的方法在性能上显著优于仅处理其中一项或两项任务的现有方法。