Graph convolutional networks (GCNs) are the most commonly used methods for skeleton-based action recognition and have achieved remarkable performance. Generating adjacency matrices with semantically meaningful edges is particularly important for this task, but extracting such edges is challenging problem. To solve this, we propose a hierarchically decomposed graph convolutional network (HD-GCN) architecture with a novel hierarchically decomposed graph (HD-Graph). The proposed HD-GCN effectively decomposes every joint node into several sets to extract major structurally adjacent and distant edges, and uses them to construct an HD-Graph containing those edges in the same semantic spaces of a human skeleton. In addition, we introduce an attention-guided hierarchy aggregation (A-HA) module to highlight the dominant hierarchical edge sets of the HD-Graph. Furthermore, we apply a new six-way ensemble method, which uses only joint and bone stream without any motion stream. The proposed model is evaluated and achieves state-of-the-art performance on four large, popular datasets. Finally, we demonstrate the effectiveness of our model with various comparative experiments.
翻译:图卷积网络(GCNs)是骨架动作识别中最常用的方法,并取得了显著性能。生成具有语义意义边的邻接矩阵对此任务尤为重要,但提取此类边是一个具有挑战性的问题。为解决这一问题,我们提出了一种层级分解图卷积网络(HD-GCN)架构,并配套设计了新颖的层级分解图(HD-Graph)。所提出的HD-GCN将每个关节节点有效分解为若干集合,以提取主要的结构相邻边和远距离边,并利用这些边构建包含人体骨架相同语义空间内上述边的HD-Graph。此外,我们引入了一个注意力引导的层级聚合(A-HA)模块,以突出HD-Graph中占主导地位的层级边集合。进一步地,我们应用了一种新的六向集成方法,该方法仅使用关节流和骨骼流,无需任何运动流。所提出模型在四个大型流行数据集上进行了评估,并取得了最先进的性能。最后,我们通过多种对比实验验证了模型的有效性。