One-shot skeleton action recognition, which aims to learn a skeleton action recognition model with a single training sample, has attracted increasing interest due to the challenge of collecting and annotating large-scale skeleton action data. However, most existing studies match skeleton sequences by comparing their feature vectors directly which neglects spatial structures and temporal orders of skeleton data. This paper presents a novel one-shot skeleton action recognition technique that handles skeleton action recognition via multi-scale spatial-temporal feature matching. We represent skeleton data at multiple spatial and temporal scales and achieve optimal feature matching from two perspectives. The first is multi-scale matching which captures the scale-wise semantic relevance of skeleton data at multiple spatial and temporal scales simultaneously. The second is cross-scale matching which handles different motion magnitudes and speeds by capturing sample-wise relevance across multiple scales. Extensive experiments over three large-scale datasets (NTU RGB+D, NTU RGB+D 120, and PKU-MMD) show that our method achieves superior one-shot skeleton action recognition, and it outperforms the state-of-the-art consistently by large margins.
翻译:单样本骨架动作识别旨在通过单个训练样本学习骨架动作识别模型,由于大规模骨架动作数据的采集与标注存在挑战,该技术日益受到关注。然而,现有研究多通过直接比较特征向量来匹配骨架序列,忽视了骨架数据的空间结构与时序信息。本文提出一种新颖的单样本骨架动作识别技术,通过多尺度时空特征匹配实现骨架动作识别。我们将骨架数据表示为多个空间与时间尺度,并从两个视角实现最优特征匹配:其一是多尺度匹配,同时捕捉骨架数据在多个空间与时间尺度上的尺度级语义相关性;其二是跨尺度匹配,通过捕获跨尺度样本级相关性处理不同运动幅度与速度。在三个大规模数据集(NTU RGB+D、NTU RGB+D 120与PKU-MMD)上的大量实验表明,该方法实现了卓越的单样本骨架动作识别性能,并以较大优势持续超越当前最先进技术。