Recognizing and continuously learning novel human actions without forgetting prior classes is a requirement for emerging AR/VR and robotics applications. For these applications, both on-device processing and learning are essential for privacy and low-latency adaptation. Event cameras address the efficiency of visual sensing with sparse, asynchronous output that is naturally compatible with neuromorphic processing. Yet no prior system has deployed a continual on-device learning pipeline for event-based action recognition using neuromorphic hardware. We present CLANE, Continual Learning of Actions on Neuromorphic Hardware from Event Cameras, deployed end-to-end on Intel Loihi 2. CLANE combines a spiking 2D CNN for spatiotemporal feature extraction with CLP-SNN as its on-chip learning head, extended to action clips via a Temporal Aggregation Layer and a fixed-point Normalization Layer, both novel Loihi 2 modules. On THU E-ACT-50, a 50-class dataset captured under real-world conditions, CLANE achieves 70.4% accuracy in a continual learning task while delivering more than 100x energy reduction and 16x lower latency over a sequential CNN+GRU+CLP edge GPU baseline, validated through iso-algorithm cross-platform benchmarking across three evaluation levels.
翻译:[翻译后的摘要]
识别并持续学习新的人类动作而不遗忘先前类别,是新兴AR/VR和机器人应用的需求。对于这些应用而言,设备端处理和学习对于隐私保护和低延迟自适应至关重要。事件相机通过产生稀疏、异步的输出(天然适配神经形态处理)来提升视觉感知的效率。然而,尚无现有系统利用神经形态硬件部署用于事件驱动动作识别的持续设备端学习流水线。我们提出CLANE——基于事件相机在神经形态硬件上的动作持续学习系统,该系统端到端地部署于Intel Loihi 2之上。CLANE将用于时空特征提取的脉冲二维CNN与作为芯片上学习头的CLP-SNN相结合,并通过时间聚合层和定点归一化层(两者均为Loihi 2的新增模块)将其扩展至动作片段。在真实世界条件下采集的50类数据集THU E-ACT-50上,CLANE在持续学习任务中达到70.4%的准确率,同时相较顺序式CNN+GRU+CLP边缘GPU基线实现超过100倍的能耗降低和16倍的延迟降低,该性能经跨三个评估级别的等算法跨平台基准测试验证。