Event2Vec: Processing Neuromorphic Events Directly by Representations in Vector Space

Neuromorphic event cameras possess superior temporal resolution, power efficiency, and dynamic range compared to traditional cameras. However, their asynchronous and sparse data format poses a significant challenge for conventional deep learning methods. Existing methods either convert the events into dense synchronous frame representations for processing by powerful CNNs or Transformers, but lose the asynchronous, sparse and high temporal resolution characteristics of events during the conversion process; or adopt irregular models such as sparse convolution, spiking neural networks, or graph neural networks to process the irregular event representations but fail to take full advantage of GPU acceleration.Inspired by word-to-vector models, we draw an analogy between words and events to introduce event2vec, a novel representation that allows neural networks to process events directly. This approach is fully compatible with the parallel processing capabilities of Transformers. We demonstrate the effectiveness of event2vec on the DVS Gesture, ASL-DVS, and DVS-Lip benchmarks, showing that event2vec is remarkably parameter-efficient, features high throughput and low latency, and achieves high accuracy even with an extremely low number of events or low spatial resolutions. Event2vec introduces a novel paradigm by demonstrating for the first time that sparse, irregular event data can be directly integrated into high-throughput Transformer architectures. This breakthrough resolves the long-standing conflict between maintaining data sparsity and maximizing GPU efficiency, offering a promising balance for real-time, low-latency neuromorphic vision tasks. The code is provided in https://github.com/Intelligent-Computing-Lab-Panda/event2vec.

翻译：与传统相机相比，神经形态事件相机具有更优的时间分辨率、能效和动态范围。然而，其异步且稀疏的数据格式对传统深度学习方法构成了重大挑战。现有方法要么将事件转换为密集的同步帧表示，以便由强大的CNN或Transformer进行处理，但在转换过程中丢失了事件的异步性、稀疏性和高时间分辨率特性；要么采用稀疏卷积、脉冲神经网络或图神经网络等不规则模型来处理不规则的事件表示，但未能充分利用GPU加速。受词向量模型的启发，我们将词语与事件进行类比，提出了event2vec这一新颖表示方法，使神经网络能够直接处理事件。该方法与Transformer的并行处理能力完全兼容。我们在DVS Gesture、ASL-DVS和DVS-Lip基准测试中验证了event2vec的有效性，结果表明event2vec具有显著的参数效率、高吞吐量、低延迟的特点，即使在事件数量极少或空间分辨率极低的情况下仍能实现高精度。Event2vec引入了一种新颖的范式，首次证明了稀疏、不规则的事件数据可以直接集成到高吞吐量的Transformer架构中。这一突破解决了长期存在的保持数据稀疏性与最大化GPU效率之间的矛盾，为实时、低延迟的神经形态视觉任务提供了有前景的平衡方案。代码发布于https://github.com/Intelligent-Computing-Lab-Panda/event2vec。