Revisit Event Generation Model: Self-Supervised Learning of Event-to-Video Reconstruction with Implicit Neural Representations

Reconstructing intensity frames from event data while maintaining high temporal resolution and dynamic range is crucial for bridging the gap between event-based and frame-based computer vision. Previous approaches have depended on supervised learning on synthetic data, which lacks interpretability and risk over-fitting to the setting of the event simulator. Recently, self-supervised learning (SSL) based methods, which primarily utilize per-frame optical flow to estimate intensity via photometric constancy, has been actively investigated. However, they are vulnerable to errors in the case of inaccurate optical flow. This paper proposes a novel SSL event-to-video reconstruction approach, dubbed EvINR, which eliminates the need for labeled data or optical flow estimation. Our core idea is to reconstruct intensity frames by directly addressing the event generation model, essentially a partial differential equation (PDE) that describes how events are generated based on the time-varying brightness signals. Specifically, we utilize an implicit neural representation (INR), which takes in spatiotemporal coordinate $(x, y, t)$ and predicts intensity values, to represent the solution of the event generation equation. The INR, parameterized as a fully-connected Multi-layer Perceptron (MLP), can be optimized with its temporal derivatives supervised by events. To make EvINR feasible for online requisites, we propose several acceleration techniques that substantially expedite the training process. Comprehensive experiments demonstrate that our EvINR surpasses previous SSL methods by 38% w.r.t. Mean Squared Error (MSE) and is comparable or superior to SoTA supervised methods. Project page: https://vlislab22.github.io/EvINR/.

翻译：从事件数据重建强度帧，同时保持高时间分辨率与宽动态范围，对于弥合事件视觉与帧视觉之间的鸿沟至关重要。现有方法依赖于合成数据的监督学习，其可解释性不足且易过拟合于事件模拟器的设定。近年来，基于自监督学习的方法被广泛研究，其主要利用逐帧光流通过光度恒定假设估计强度。然而，这些方法在光流估计不准确时表现脆弱。本文提出一种新颖的自监督事件到视频重建方法，称为EvINR，该方法无需标注数据或光流估计。我们的核心思想是通过直接求解事件生成模型——本质上是一个描述事件如何基于时变亮度信号生成的偏微分方程——来重建强度帧。具体而言，我们采用隐式神经表示，其输入时空坐标$(x, y, t)$并预测强度值，以表示事件生成方程的解。该INR参数化为全连接多层感知机，可通过事件对其时间导数进行监督优化。为使EvINR满足在线应用需求，我们提出了多项加速技术，显著提升了训练效率。综合实验表明，我们的EvINR在均方误差指标上超越先前自监督方法38%，并与当前最优监督方法相当或更优。项目页面：https://vlislab22.github.io/EvINR/。