Rethinking Efficient and Effective Point-based Networks for Event Camera Classification and Regression: EventMamba

Event cameras draw inspiration from biological systems, boasting low latency and high dynamic range while consuming minimal power. The most current approach to processing Event Cloud often involves converting it into frame-based representations, which neglects the sparsity of events, loses fine-grained temporal information, and increases the computational burden. In contrast, Point Cloud is a popular representation for processing 3-dimensional data and serves as an alternative method to exploit local and global spatial features. Nevertheless, previous point-based methods show an unsatisfactory performance compared to the frame-based method in dealing with spatio-temporal event streams. In order to bridge the gap, we propose EventMamba, an efficient and effective framework based on Point Cloud representation by rethinking the distinction between Event Cloud and Point Cloud, emphasizing vital temporal information. The Event Cloud is subsequently fed into a hierarchical structure with staged modules to process both implicit and explicit temporal features. Specifically, we redesign the global extractor to enhance explicit temporal extraction among a long sequence of events with temporal aggregation and State Space Model (SSM) based Mamba. Our model consumes minimal computational resources in the experiments and still exhibits SOTA point-based performance on six different scales of action recognition datasets. It even outperformed all frame-based methods on both Camera Pose Relocalization (CPR) and eye-tracking regression tasks. Our code is available at: https://github.com/rhwxmx/EventMamba.

翻译：事件相机受生物系统启发，具备低延迟、高动态范围和低功耗的特点。当前处理事件云的主流方法通常将其转换为基于帧的表征，这忽略了事件的稀疏性，丢失了细粒度的时间信息，并增加了计算负担。相比之下，点云是处理三维数据的常用表征方式，可作为挖掘局部与全局空间特征的替代方法。然而，先前基于点云的方法在处理时空事件流时，其性能相较于基于帧的方法仍不尽如人意。为弥补这一差距，我们提出EventMamba——一种基于点云表征的高效且有效的框架，通过重新思考事件云与点云之间的差异，并强调关键的时间信息。事件云随后被输入一个包含分阶段模块的层次化结构中，以处理隐式与显式的时间特征。具体而言，我们重新设计了全局特征提取器，通过时间聚合与基于状态空间模型（SSM）的Mamba来增强长事件序列中的显式时间特征提取。我们的模型在实验中消耗极少的计算资源，并在六个不同规模的动作识别数据集上仍展现出基于点云方法的先进性能。在相机姿态重定位（CPR）和眼动追踪回归任务中，其表现甚至超越了所有基于帧的方法。我们的代码公开于：https://github.com/rhwxmx/EventMamba。