Skeleton-based action recognition has achieved remarkable performance with the development of graph convolutional networks (GCNs). However, most of these methods tend to construct complex topology learning mechanisms while neglecting the inherent symmetry of the human body. Additionally, the use of temporal convolutions with certain fixed receptive fields limits their capacity to effectively capture dependencies in time sequences. To address the issues, we (1) propose a novel Topological Symmetry Enhanced Graph Convolution (TSE-GC) to enable distinct topology learning across different channel partitions while incorporating topological symmetry awareness and (2) construct a Multi-Branch Deformable Temporal Convolution (MBDTC) for skeleton-based action recognition. The proposed TSE-GC emphasizes the inherent symmetry of the human body while enabling efficient learning of dynamic topologies. Meanwhile, the design of MBDTC introduces the concept of deformable modeling, leading to more flexible receptive fields and stronger modeling capacity of temporal dependencies. Combining TSE-GC with MBDTC, our final model, TSE-GCN, achieves competitive performance with fewer parameters compared with state-of-the-art methods on three large datasets, NTU RGB+D, NTU RGB+D 120, and NW-UCLA. On the cross-subject and cross-set evaluations of NTU RGB+D 120, the accuracies of our model reach 90.0\% and 91.1\%, with 1.1M parameters and 1.38 GFLOPS for one stream.
翻译:随着图卷积网络(GCNs)的发展,基于骨架的动作识别已取得显著性能。然而,大多数现有方法倾向于构建复杂的拓扑学习机制,却忽视了人体固有的对称性。此外,采用具有固定感受野的时间卷积限制了其有效捕捉时间序列依赖关系的能力。为解决这些问题,我们(1)提出了一种新颖的拓扑对称性增强图卷积(TSE-GC),以实现在不同通道分区上进行差异化拓扑学习,同时融入拓扑对称性感知;(2)构建了用于骨架动作识别的多分支可变形时间卷积(MBDTC)。所提出的TSE-GC在强调人体固有对称性的同时,实现了动态拓扑的高效学习。与此同时,MBDTC的设计引入了可变形建模思想,从而获得更灵活的感受野和更强的时序依赖建模能力。通过将TSE-GC与MBDTC结合,我们的最终模型TSE-GCN在三个大型数据集(NTU RGB+D、NTU RGB+D 120和NW-UCLA)上,以更少的参数量取得了与最先进方法相竞争的性能。在NTU RGB+D 120的跨主体与跨场景评估中,我们的模型在单流模式下以110万参数和1.38 GFLOPS的计算量,准确率分别达到90.0%和91.1%。