Contrastive learning has been proven beneficial for self-supervised skeleton-based action recognition. Most contrastive learning methods utilize carefully designed augmentations to generate different movement patterns of skeletons for the same semantics. However, it is still a pending issue to apply strong augmentations, which distort the images/skeletons' structures and cause semantic loss, due to their resulting unstable training. In this paper, we investigate the potential of adopting strong augmentations and propose a general hierarchical consistent contrastive learning framework (HiCLR) for skeleton-based action recognition. Specifically, we first design a gradual growing augmentation policy to generate multiple ordered positive pairs, which guide to achieve the consistency of the learned representation from different views. Then, an asymmetric loss is proposed to enforce the hierarchical consistency via a directional clustering operation in the feature space, pulling the representations from strongly augmented views closer to those from weakly augmented views for better generalizability. Meanwhile, we propose and evaluate three kinds of strong augmentations for 3D skeletons to demonstrate the effectiveness of our method. Extensive experiments show that HiCLR outperforms the state-of-the-art methods notably on three large-scale datasets, i.e., NTU60, NTU120, and PKUMMD.
翻译:对比学习已被证明对自监督骨架动作识别具有重要价值。大多数对比学习方法采用精心设计的增强策略,为相同语义生成不同骨架运动模式。然而,由于强增强会破坏图像/骨架结构并导致语义损失,其应用仍面临训练不稳定的问题。本文探究采用强增强的潜在可能,并提出通用层级一致性对比学习框架(HiCLR)用于骨架动作识别。具体而言,我们首先设计渐进式增强策略生成多个有序正样本对,引导从不同视角实现学习表征的一致性。继而提出非对称损失函数,通过特征空间中的方向性聚类操作强化层级一致性,使强增强视图的表征向弱增强视图靠近以提升泛化能力。同时,我们针对三维骨架提出并评估三种强增强方法验证框架有效性。大量实验表明,HiCLR在NTU60、NTU120和PKUMMD三个大规模数据集上显著超越现有最优方法。