In skeleton-based action recognition, a key challenge is distinguishing between actions with similar trajectories of joints due to the lack of image-level details in skeletal representations. Recognizing that the differentiation of similar actions relies on subtle motion details in specific body parts, we direct our approach to focus on the fine-grained motion of local skeleton components. To this end, we introduce ProtoGCN, a Graph Convolutional Network (GCN)-based model that breaks down the dynamics of entire skeleton sequences into a combination of learnable prototypes representing core motion patterns of action units. By contrasting the reconstruction of prototypes, ProtoGCN can effectively identify and enhance the discriminative representation of similar actions. Without bells and whistles, ProtoGCN achieves state-of-the-art performance on multiple benchmark datasets, including NTU RGB+D, NTU RGB+D 120, Kinetics-Skeleton, and FineGYM, which demonstrates the effectiveness of the proposed method. The code is available at https://github.com/firework8/ProtoGCN.
翻译:在基于骨架的动作识别中,一个关键挑战在于区分关节轨迹相似的动作,这是由于骨架表示缺乏图像层面的细节。认识到相似动作的区分依赖于特定身体部位的细微运动细节,我们的方法旨在聚焦于局部骨架组件的细粒度运动。为此,我们提出了ProtoGCN,一种基于图卷积网络(GCN)的模型,它将整个骨架序列的动态分解为代表动作单元核心运动模式的可学习原型的组合。通过对比原型的重构,ProtoGCN能够有效识别并增强相似动作的判别性表示。在未添加额外技巧的情况下,ProtoGCN在多个基准数据集上实现了最先进的性能,包括NTU RGB+D、NTU RGB+D 120、Kinetics-Skeleton和FineGYM,这证明了所提方法的有效性。代码可在https://github.com/firework8/ProtoGCN获取。