Recently action recognition has received more and more attention for its comprehensive and practical applications in intelligent surveillance and human-computer interaction. However, few-shot action recognition has not been well explored and remains challenging because of data scarcity. In this paper, we propose a novel hierarchical compositional representations (HCR) learning approach for few-shot action recognition. Specifically, we divide a complicated action into several sub-actions by carefully designed hierarchical clustering and further decompose the sub-actions into more fine-grained spatially attentional sub-actions (SAS-actions). Although there exist large differences between base classes and novel classes, they can share similar patterns in sub-actions or SAS-actions. Furthermore, we adopt the Earth Mover's Distance in the transportation problem to measure the similarity between video samples in terms of sub-action representations. It computes the optimal matching flows between sub-actions as distance metric, which is favorable for comparing fine-grained patterns. Extensive experiments show our method achieves the state-of-the-art results on HMDB51, UCF101 and Kinetics datasets.
翻译:近年来,动作识别因其在智能监控和人机交互中的综合实用价值而受到日益广泛的关注。然而,由于数据稀缺,小样本动作识别尚未得到充分探索且仍具挑战性。本文提出一种新颖的层次化组合表示学习(HCR)方法,用于小样本动作识别。具体而言,我们通过精心设计的层次聚类将复杂动作分解为若干子动作,并进一步将子动作分解为更细粒度的空间注意力子动作(SAS-action)。尽管基类与新颖类之间存在较大差异,但它们在子动作或SAS-action层级上可共享相似模式。此外,我们采用运输问题中的推土机距离,以子动作表示为基础度量视频样本间的相似性。该方法通过计算子动作间的最优匹配流作为距离度量,有利于细粒度模式的比较。大量实验表明,我们的方法在HMDB51、UCF101和Kinetics数据集上均取得了最先进的性能。