Skeleton-based action recognition has achieved remarkable performance with the development of graph convolutional networks (GCNs). However, most of these methods tend to construct complex topology learning mechanisms while neglecting the inherent symmetry of the human body. Additionally, the use of temporal convolutions with certain fixed receptive fields limits their capacity to effectively capture dependencies in time sequences. To address the issues, we (1) propose a novel Topological Symmetry Enhanced Graph Convolution (TSE-GC) to enable distinct topology learning across different channel partitions while incorporating topological symmetry awareness and (2) construct a Multi-Branch Deformable Temporal Convolution (MBDTC) for skeleton-based action recognition. The proposed TSE-GC emphasizes the inherent symmetry of the human body while enabling efficient learning of dynamic topologies. Meanwhile, the design of MBDTC introduces the concept of deformable modeling, leading to more flexible receptive fields and stronger modeling capacity of temporal dependencies. Combining TSE-GC with MBDTC, our final model, TSE-GCN, achieves competitive performance with fewer parameters compared with state-of-the-art methods on three large datasets, NTU RGB+D, NTU RGB+D 120, and NW-UCLA. On the cross-subject and cross-set evaluations of NTU RGB+D 120, the accuracies of our model reach 90.0\% and 91.1\%, with 1.1M parameters and 1.38 GFLOPS for one stream.
翻译:随着图卷积网络(GCNs)的发展,基于骨架的动作识别已取得显著性能。然而,大多数方法倾向于构建复杂的拓扑学习机制,却忽视了人体固有的对称性。此外,使用具有固定感受野的时间卷积限制了其有效捕捉时间序列依赖关系的能力。为解决这些问题,我们(1)提出了一种新颖的拓扑对称增强图卷积(TSE-GC),以实现在不同通道划分中进行差异化拓扑学习,同时融入拓扑对称感知;以及(2)构建了一个用于基于骨架动作识别的多分支可变形时间卷积(MBDTC)。所提出的TSE-GC在强调人体固有对称性的同时,实现了动态拓扑的高效学习。与此同时,MBDTC的设计引入了可变形建模的概念,从而带来更灵活的感受野和更强的时序依赖建模能力。将TSE-GC与MBDTC相结合,我们的最终模型TSE-GCN在三个大型数据集NTU RGB+D、NTU RGB+D 120和NW-UCLA上,以更少的参数实现了与最先进方法相竞争的性能。在NTU RGB+D 120的跨主体和跨场景评估中,我们的模型准确率分别达到90.0%和91.1%,单流模型参数量为1.1M,计算量为1.38 GFLOPS。