Graph convolution networks (GCNs) have achieved remarkable performance in skeleton-based action recognition. However, existing previous GCN-based methods have relied excessively on elaborate human body priors and constructed complex feature aggregation mechanisms, which limits the generalizability of networks. To solve these problems, we propose a novel Spatial Topology Gating Unit (STGU), which is an MLP-based variant without extra priors, to capture the co-occurrence topology features that encode the spatial dependency across all joints. In STGU, to model the sample-specific and completely independent point-wise topology attention, a new gate-based feature interaction mechanism is introduced to activate the features point-to-point by the attention map generated from the input. Based on the STGU, in this work, we propose the first topology-aware MLP-based model, Ta-MLP, for skeleton-based action recognition. In comparison with existing previous methods on three large-scale datasets, Ta-MLP achieves competitive performance. In addition, Ta-MLP reduces the parameters by up to 62.5% with favorable results. Compared with previous state-of-the-art (SOAT) approaches, Ta-MLP pushes the frontier of real-time action recognition. The code will be available at https://github.com/BUPTSJZhang/Ta-MLP.
翻译:图卷积网络(GCNs)在基于骨架的动作识别中取得了显著性能。然而,现有基于GCN的方法过度依赖精细的人体先验知识,并构建了复杂的特征聚合机制,这限制了网络的泛化能力。为解决这些问题,我们提出了一种新颖的空间拓扑门控单元(STGU),这是一种无需额外先验知识的基于MLP的变体,用于捕获编码所有关节间空间依赖性的共现拓扑特征。在STGU中,为建模样本特异且完全独立的逐点拓扑注意力,我们引入了一种新的基于门控的特征交互机制,通过由输入生成的注意力图逐点激活特征。基于STGU,本文首次提出基于拓扑感知MLP的模型Ta-MLP,用于基于骨架的动作识别。在三个大规模数据集上与现有方法相比,Ta-MLP取得了具有竞争力的性能。此外,Ta-MLP在保持良好结果的同时,将参数量减少了高达62.5%。与当前最先进(SOAT)方法相比,Ta-MLP推动了实时动作识别的前沿发展。代码将发布于https://github.com/BUPTSJZhang/Ta-MLP。