Skeleton-based action recognition has attracted much attention, benefiting from its succinctness and robustness. However, the minimal inter-class variation in similar action sequences often leads to confusion. The inherent spatiotemporal coupling characteristics make it challenging to mine the subtle differences in joint motion trajectories, which is critical for distinguishing confusing fine-grained actions. To alleviate this problem, we propose a Wavelet-Attention Decoupling (WAD) module that utilizes discrete wavelet transform to effectively disentangle salient and subtle motion features in the time-frequency domain. Then, the decoupling attention adaptively recalibrates their temporal responses. To further amplify the discrepancies in these subtle motion features, we propose a Fine-grained Contrastive Enhancement (FCE) module to enhance attention towards trajectory features by contrastive learning. Extensive experiments are conducted on the coarse-grained dataset NTU RGB+D and the fine-grained dataset FineGYM. Our methods perform competitively compared to state-of-the-art methods and can discriminate confusing fine-grained actions well.
翻译:基于骨架的动作识别因其简洁性和鲁棒性而备受关注。然而,相似动作序列中极小的类间差异常导致混淆。固有的时空耦合特性使得挖掘关节运动轨迹中的细微差异变得困难,而这正是区分混淆的细粒度动作的关键。为解决此问题,我们提出一种小波注意力解耦(WAD)模块,利用离散小波变换有效分离时频域中的显著与细微运动特征。随后,解耦注意力自适应地重新校准它们的时间响应。为进一步放大这些细微运动特征的差异,我们提出一种细粒度对比增强(FCE)模块,通过对比学习增强对轨迹特征的关注。在粗粒度数据集NTU RGB+D和细粒度数据集FineGYM上进行了大量实验。我们的方法与现有最优方法相比具有竞争力,并能很好地区分混淆的细粒度动作。