In recent years, skeleton-based action recognition, leveraging multimodal Graph Convolutional Networks (GCN), has achieved remarkable results. However, due to their deep structure and reliance on continuous floating-point operations, GCN-based methods are energy-intensive. We propose an innovative Spiking Graph Convolutional Network with Multimodal Fusion and Knowledge Distillation (MK-SGN) to address this issue. By merging the energy efficiency of Spiking Neural Network (SNN) with the graph representation capability of GCN, the proposed MK-SGN reduces energy consumption while maintaining recognition accuracy. Firstly, we convert Graph Convolutional Networks (GCN) into Spiking Graph Convolutional Networks (SGN) establishing a new benchmark and paving the way for future research exploration. During this process, we introduce a spiking attention mechanism and design a Spiking-Spatio Graph Convolution module with a Spatial Global Spiking Attention mechanism (SA-SGC), enhancing feature learning capability. Secondly, we propose a Spiking Multimodal Fusion module (SMF), leveraging mutual information to process multimodal data more efficiently. Lastly, we delve into knowledge distillation methods from multimodal GCN to SGN and propose a novel, integrated method that simultaneously focuses on both intermediate layer distillation and soft label distillation to improve the performance of SGN. MK-SGN outperforms the state-of-the-art GCN-like frameworks on three challenging datasets for skeleton-based action recognition in reducing energy consumption. It also outperforms the state-of-the-art SNN frameworks in accuracy. Specifically, our method reduces energy consumption by more than 98% compared to typical GCN-based methods, while maintaining competitive accuracy on the NTU-RGB+D 60 cross-subject split using 4-time steps.
翻译:近年来,利用多模态图卷积网络(GCN)的基于骨架的动作识别取得了显著成果。然而,由于其深层结构及对连续浮点运算的依赖,基于GCN的方法能耗较高。为解决此问题,我们提出了一种创新的融合多模态与知识蒸馏的脉冲图卷积网络(MK-SGN)。通过将脉冲神经网络(SNN)的能效优势与GCN的图表示能力相结合,所提出的MK-SGN在保持识别精度的同时降低了能耗。首先,我们将图卷积网络(GCN)转换为脉冲图卷积网络(SGN),建立了新的基准,并为未来的研究探索铺平了道路。在此过程中,我们引入了脉冲注意力机制,并设计了一个带有空间全局脉冲注意力机制(SA-SGC)的脉冲时空图卷积模块,增强了特征学习能力。其次,我们提出了一个脉冲多模态融合模块(SMF),利用互信息更高效地处理多模态数据。最后,我们深入研究了从多模态GCN到SGN的知识蒸馏方法,并提出了一种新颖的集成方法,该方法同时关注中间层蒸馏和软标签蒸馏,以提升SGN的性能。在三个具有挑战性的基于骨架动作识别数据集上,MK-SGN在降低能耗方面优于最先进的类GCN框架,同时在精度上超越了最先进的SNN框架。具体而言,在NTU-RGB+D 60跨受试者划分数据集上使用4个时间步长时,我们的方法相比典型的基于GCN的方法降低了超过98%的能耗,同时保持了具有竞争力的精度。