Skeleton-based action recognition has achieved remarkable results in human action recognition with the development of graph convolutional networks (GCNs). However, the recent works tend to construct complex learning mechanisms with redundant training and exist a bottleneck for long time-series. To solve these problems, we propose the Temporal-Spatio Graph ConvNeXt (TSGCNeXt) to explore efficient learning mechanism of long temporal skeleton sequences. Firstly, a new graph learning mechanism with simple structure, Dynamic-Static Separate Multi-graph Convolution (DS-SMG) is proposed to aggregate features of multiple independent topological graphs and avoid the node information being ignored during dynamic convolution. Next, we construct a graph convolution training acceleration mechanism to optimize the back-propagation computing of dynamic graph learning with 55.08\% speed-up. Finally, the TSGCNeXt restructure the overall structure of GCN with three Spatio-temporal learning modules,efficiently modeling long temporal features. In comparison with existing previous methods on large-scale datasets NTU RGB+D 60 and 120, TSGCNeXt outperforms on single-stream networks. In addition, with the ema model introduced into the multi-stream fusion, TSGCNeXt achieves SOTA levels. On the cross-subject and cross-set of the NTU 120, accuracies reach 90.22% and 91.74%.
翻译:基于骨架的动作识别随着图卷积网络(GCNs)的发展在人体动作识别领域取得了显著成果。然而,近期研究倾向于构建冗余训练下的复杂学习机制,且在长时间序列处理上存在瓶颈。为解决这些问题,我们提出时空图卷积下一代网络(TSGCNeXt)以探索长时序骨架序列的高效学习机制。首先,提出一种结构简单的动态-静态分离多图卷积(DS-SMG)新图学习机制,用于聚合多个独立拓扑图的特征,并避免动态卷积中节点信息被忽略的问题。其次,构建图卷积训练加速机制,将动态图学习的反向传播计算速度提升55.08%。最后,TSGCNeXt通过三个时空学习模块重构GCN整体结构,高效建模长时域特征。在大型数据集NTU RGB+D 60和120上,与现有方法相比,TSGCNeXt在单流网络中表现更优。此外,将指数移动平均(EMA)模型引入多流融合后,TSGCNeXt达到最优水平(SOTA)。在NTU 120的跨受试者和跨集测试中,准确率分别达到90.22%和91.74%。