Graph Convolutional Networks (GCNs) have long defined the state-of-the-art in skeleton-based action recognition, leveraging their ability to unravel the complex dynamics of human joint topology through the graph's adjacency matrix. However, an inherent flaw has come to light in these cutting-edge models: they tend to optimize the adjacency matrix jointly with the model weights. This process, while seemingly efficient, causes a gradual decay of bone connectivity data, culminating in a model indifferent to the very topology it sought to map. As a remedy, we propose a threefold strategy: (1) We forge an innovative pathway that encodes bone connectivity by harnessing the power of graph distances. This approach preserves the vital topological nuances often lost in conventional GCNs. (2) We highlight an oft-overlooked feature - the temporal mean of a skeletal sequence, which, despite its modest guise, carries highly action-specific information. (3) Our investigation revealed strong variations in joint-to-joint relationships across different actions. This finding exposes the limitations of a single adjacency matrix in capturing the variations of relational configurations emblematic of human movement, which we remedy by proposing an efficient refinement to Graph Convolutions (GC) - the BlockGC. This evolution slashes parameters by a substantial margin (above 40%), while elevating performance beyond original GCNs. Our full model, the BlockGCN, establishes new standards in skeleton-based action recognition for small model sizes. Its high accuracy, notably on the large-scale NTU RGB+D 120 dataset, stand as compelling proof of the efficacy of BlockGCN. The source code and model can be found at https://github.com/ZhouYuxuanYX/BlockGCN.
翻译:图卷积网络(GCNs)凭借其通过图邻接矩阵解析人体关节拓扑复杂动态的能力,长期主导着基于骨架的动作识别领域的先进水平。然而,这些尖端模型中暴露出一个固有缺陷:它们倾向于将邻接矩阵与模型权重联合优化。这一过程虽看似高效,却会导致骨骼连接数据逐渐衰减,最终使模型对其原本试图映射的拓扑结构变得漠不关心。为弥补这一缺陷,我们提出三重策略:(1)构建创新路径,利用图距离编码骨骼连接性。该方法保留了传统GCNs常丢失的关键拓扑细节。(2)揭示一个常被忽视的特征——骨架序列的时间均值,该特征虽看似平凡,却蕴含高度动作特异性信息。(3)研究发现不同动作中关节间关系存在显著差异,这揭示了单一邻接矩阵难以捕捉反映人体运动特征的关联配置变化。为此,我们提出图卷积(GC)的高效改进方案——BlockGC。这一改进大幅削减参数(超过40%),同时性能超越原始GCNs。完整模型BlockGCN为小规模模型下的骨架动作识别确立了新基准。其在大型NTU RGB+D 120数据集上的高精度表现,有力证明了BlockGCN的有效性。源代码与模型可从https://github.com/ZhouYuxuanYX/BlockGCN 获取。