In skeletal-based action recognition, Graph Convolutional Networks (GCNs) based methods face limitations due to their complexity and high energy consumption. Spiking Neural Networks (SNNs) have gained attention in recent years for their low energy consumption, but existing methods combining GCNs and SNNs fail to fully utilize the temporal characteristics of skeletal sequences, leading to increased storage and computational costs. To address this issue, we propose a Signal-SGN(Spiking Graph Convolutional Network), which leverages the temporal dimension of skeletal sequences as the spiking timestep and treats features as discrete stochastic signals. The core of the network consists of a 1D Spiking Graph Convolutional Network (1D-SGN) and a Frequency Spiking Convolutional Network (FSN). The SGN performs graph convolution on single frames and incorporates spiking network characteristics to capture inter-frame temporal relationships, while the FSN uses Fast Fourier Transform (FFT) and complex convolution to extract temporal-frequency features. We also introduce a multi-scale wavelet transform feature fusion module(MWTF) to capture spectral features of temporal signals, enhancing the model's classification capability. We propose a pluggable temporal-frequency spatial semantic feature extraction module(TFSM) to enhance the model's ability to distinguish features without increasing inference-phase consumption. Our numerous experiments on the NTU RGB+D, NTU RGB+D 120, and NW-UCLA datasets demonstrate that the proposed models not only surpass existing SNN-based methods in accuracy but also reduce computational and storage costs during training. Furthermore, they achieve competitive accuracy compared to corresponding GCN-based methods, which is quite remarkable.
翻译:在基于骨架的动作识别中,基于图卷积网络(GCNs)的方法因其复杂性和高能耗而面临局限。脉冲神经网络(SNNs)近年来因其低能耗而受到关注,但现有的结合GCNs和SNNs的方法未能充分利用骨架序列的时间特性,导致存储和计算成本增加。为解决此问题,我们提出了一种信号-SGN(脉冲图卷积网络),该网络将骨架序列的时间维度用作脉冲时间步长,并将特征视为离散随机信号。网络的核心由一维脉冲图卷积网络(1D-SGN)和频率脉冲卷积网络(FSN)组成。SGN在单帧上执行图卷积,并结合脉冲网络特性以捕获帧间时间关系,而FSN则使用快速傅里叶变换(FFT)和复数卷积来提取时频特征。我们还引入了一个多尺度小波变换特征融合模块(MWTF)来捕获时间信号的频谱特征,从而增强模型的分类能力。我们提出了一种可插拔的时频空间语义特征提取模块(TFSM),以在不增加推理阶段消耗的情况下增强模型的特征区分能力。我们在NTU RGB+D、NTU RGB+D 120和NW-UCLA数据集上进行的大量实验表明,所提出的模型不仅在准确性上超越了现有的基于SNN的方法,还降低了训练期间的计算和存储成本。此外,与相应的基于GCN的方法相比,它们取得了具有竞争力的准确性,这相当引人注目。