Event-stream representation is the first step for many computer vision tasks using event cameras. It converts the asynchronous event-streams into a formatted structure so that conventional machine learning models can be applied easily. However, most of the state-of-the-art event-stream representations are manually designed and the quality of these representations cannot be guaranteed due to the noisy nature of event-streams. In this paper, we introduce a data-driven approach aiming at enhancing the quality of event-stream representations. Our approach commences with the introduction of a new event-stream representation based on spatial-temporal statistics, denoted as EvRep. Subsequently, we theoretically derive the intrinsic relationship between asynchronous event-streams and synchronous video frames. Building upon this theoretical relationship, we train a representation generator, RepGen, in a self-supervised learning manner accepting EvRep as input. Finally, the event-streams are converted to high-quality representations, termed as EvRepSL, by going through the learned RepGen (without the need of fine-tuning or retraining). Our methodology is rigorously validated through extensive evaluations on a variety of mainstream event-based classification and optical flow datasets (captured with various types of event cameras). The experimental results highlight not only our approach's superior performance over existing event-stream representations but also its versatility, being agnostic to different event cameras and tasks.
翻译:事件流表示是使用事件相机进行许多计算机视觉任务的第一步。它将异步事件流转换为格式化的结构,以便传统机器学习模型能够轻松应用。然而,大多数先进的事件流表示是手动设计的,并且由于事件流的噪声特性,这些表示的质量无法得到保证。本文提出了一种数据驱动的方法,旨在提升事件流表示的质量。我们的方法首先引入一种基于时空统计的新型事件流表示,记为EvRep。随后,我们从理论上推导了异步事件流与同步视频帧之间的内在关系。基于这一理论关系,我们以自监督学习的方式训练一个表示生成器RepGen,该生成器以EvRep作为输入。最终,事件流通过已学习的RepGen(无需微调或重新训练)转换为高质量表示,称为EvRepSL。我们的方法通过在多种主流事件分类与光流数据集(使用不同类型的事件相机采集)上进行广泛评估而得到严格验证。实验结果不仅凸显了我们方法相对于现有事件流表示的优越性能,还展示了其通用性,能够适应不同的事件相机和任务。