This paper explores the promising interplay between spiking neural networks (SNNs) and event-based cameras for privacy-preserving human action recognition (HAR). The unique feature of event cameras in capturing only the outlines of motion, combined with SNNs' proficiency in processing spatiotemporal data through spikes, establishes a highly synergistic compatibility for event-based HAR. Previous studies, however, have been limited by SNNs' ability to process long-term temporal information, essential for precise HAR. In this paper, we introduce two novel frameworks to address this: temporal segment-based SNN (\textit{TS-SNN}) and 3D convolutional SNN (\textit{3D-SNN}). The \textit{TS-SNN} extracts long-term temporal information by dividing actions into shorter segments, while the \textit{3D-SNN} replaces 2D spatial elements with 3D components to facilitate the transmission of temporal information. To promote further research in event-based HAR, we create a dataset, \textit{FallingDetection-CeleX}, collected using the high-resolution CeleX-V event camera $(1280 \times 800)$, comprising 7 distinct actions. Extensive experimental results show that our proposed frameworks surpass state-of-the-art SNN methods on our newly collected dataset and three other neuromorphic datasets, showcasing their effectiveness in handling long-range temporal information for event-based HAR.
翻译:本文探讨了脉冲神经网络(SNNs)与事件相机在隐私保护的人体动作识别(HAR)领域极具前景的协同作用。事件相机仅捕捉运动轮廓的独特特性,与SNNs通过脉冲处理时空数据的优势相结合,为基于事件的HAR建立了高度协同的兼容性。然而,先前的研究受限于SNNs处理长期时序信息的能力,而这对于精确的HAR至关重要。在本文中,我们引入了两种新颖的框架来解决这一问题:基于时序片段的SNN(\textit{TS-SNN})和3D卷积SNN(\textit{3D-SNN})。\textit{TS-SNN}通过将动作划分为较短的片段来提取长期时序信息,而\textit{3D-SNN}则用3D组件替换2D空间元素,以促进时序信息的传递。为了推动基于事件的HAR的进一步研究,我们使用高分辨率CeleX-V事件相机($1280 \times 800$)采集并创建了一个数据集\textit{FallingDetection-CeleX},其中包含7个不同的动作。大量的实验结果表明,我们提出的框架在我们新收集的数据集以及其他三个神经形态数据集上超越了最先进的SNN方法,证明了其在处理基于事件的HAR中长程时序信息方面的有效性。