In the field of skeleton-based action recognition, current top-performing graph convolutional networks (GCNs) exploit intra-sequence context to construct adaptive graphs for feature aggregation. However, we argue that such context is still \textit{local} since the rich cross-sequence relations have not been explicitly investigated. In this paper, we propose a graph contrastive learning framework for skeleton-based action recognition (\textit{SkeletonGCL}) to explore the \textit{global} context across all sequences. In specific, SkeletonGCL associates graph learning across sequences by enforcing graphs to be class-discriminative, \emph{i.e.,} intra-class compact and inter-class dispersed, which improves the GCN capacity to distinguish various action patterns. Besides, two memory banks are designed to enrich cross-sequence context from two complementary levels, \emph{i.e.,} instance and semantic levels, enabling graph contrastive learning in multiple context scales. Consequently, SkeletonGCL establishes a new training paradigm, and it can be seamlessly incorporated into current GCNs. Without loss of generality, we combine SkeletonGCL with three GCNs (2S-ACGN, CTR-GCN, and InfoGCN), and achieve consistent improvements on NTU60, NTU120, and NW-UCLA benchmarks. The source code will be available at \url{https://github.com/OliverHxh/SkeletonGCL}.
翻译:在基于骨架的动作识别领域,当前性能最优的图卷积网络通过利用序列内上下文构建自适应图进行特征聚合。然而,我们认为这种上下文仍然是局部性的,因为丰富的跨序列关系尚未得到显式探究。本文提出一种面向基于骨架的动作识别的图对比学习框架(SkeletonGCL),旨在探索跨所有序列的全局上下文。具体而言,SkeletonGCL通过强制图具有类别区分性(即类内紧凑、类间分散)来关联跨序列的图学习,从而提升GCN区分不同动作模式的能力。此外,我们设计了两个记忆库,分别从实例和语义两个互补层级丰富跨序列上下文,实现多尺度上下文下的图对比学习。由此,SkeletonGCL建立了一种新的训练范式,并可无缝集成到现有GCN中。在不失一般性的前提下,我们将SkeletonGCL与三种GCN(2S-ACGN、CTR-GCN和InfoGCN)结合,在NTU60、NTU120和NW-UCLA基准测试中均取得了一致性提升。源代码将发布于\url{https://github.com/OliverHxh/SkeletonGCL}。