Rethinking Efficient and Effective Point-based Networks for Event Camera Classification and Regression: EventMamba

Event cameras, drawing inspiration from biological systems, efficiently detect changes in ambient light with low latency and high dynamic range while consuming minimal power. The most current approach to processing event data often involves converting it into frame-based representations, which is well-established in traditional vision. However, this approach neglects the sparsity of event data, loses fine-grained temporal information during the transformation process, and increases the computational burden, making it ineffective for characterizing event camera properties. In contrast, Point Cloud is a popular representation for 3D processing and is better suited to match the sparse and asynchronous nature of the event camera. Nevertheless, despite the theoretical compatibility of point-based methods with event cameras, the results show a performance gap that is not yet satisfactory compared to frame-based methods. In order to bridge the performance gap, we propose EventMamba, an efficient and effective Point Cloud framework that achieves competitive results even compared to the state-of-the-art (SOTA) frame-based method in both classification and regression tasks. This notable accomplishment is facilitated by our rethinking of the distinction between Event Cloud and Point Cloud, emphasizing effective temporal information extraction through optimized network structures. Specifically, EventMamba leverages temporal aggregation and State Space Model (SSM) based Mamba boasting enhanced temporal information extraction capabilities. Through a hierarchical structure, EventMamba is adept at abstracting local and global spatial features and implicit and explicit temporal features. By adhering to the lightweight design principle, EventMamba delivers impressive results with minimal computational resource utilization, demonstrating its efficiency and effectiveness.

翻译：事件相机受生物系统启发，能够以低延迟、高动态范围和极低功耗高效检测环境光的变化。当前处理事件数据的主流方法通常将其转换为基于帧的表示，这在传统视觉领域已较为成熟。然而，这种方法忽略了事件数据的稀疏性，在转换过程中丢失了细粒度的时间信息，并增加了计算负担，使其难以有效表征事件相机的特性。相比之下，点云作为三维处理的常用表示形式，更契合事件相机稀疏与异步的本质。尽管如此，尽管基于点的方法在理论上与事件相机兼容，但其性能与基于帧的方法相比仍存在差距，尚未达到令人满意的水平。为弥合这一性能差距，我们提出了EventMamba——一个高效且有效的点云框架，在分类和回归任务中即使与最先进的基于帧的方法相比也能取得具有竞争力的结果。这一显著成就是基于我们对事件云与点云差异的重新思考，强调通过优化的网络结构实现有效的时间信息提取。具体而言，EventMamba利用时间聚合和基于状态空间模型（SSM）的Mamba模块，增强了时间信息提取能力。通过分层结构，EventMamba能够熟练地提取局部与全局空间特征以及隐式与显式时间特征。遵循轻量化设计原则，EventMamba以极少的计算资源消耗实现了优异的性能，充分证明了其高效性与有效性。