This study investigates unsupervised anomaly action recognition, which identifies video-level abnormal-human-behavior events in an unsupervised manner without abnormal samples, and simultaneously addresses three limitations in the conventional skeleton-based approaches: target domain-dependent DNN training, robustness against skeleton errors, and a lack of normal samples. We present a unified, user prompt-guided zero-shot learning framework using a target domain-independent skeleton feature extractor, which is pretrained on a large-scale action recognition dataset. Particularly, during the training phase using normal samples, the method models the distribution of skeleton features of the normal actions while freezing the weights of the DNNs and estimates the anomaly score using this distribution in the inference phase. Additionally, to increase robustness against skeleton errors, we introduce a DNN architecture inspired by a point cloud deep learning paradigm, which sparsely propagates the features between joints. Furthermore, to prevent the unobserved normal actions from being misidentified as abnormal actions, we incorporate a similarity score between the user prompt embeddings and skeleton features aligned in the common space into the anomaly score, which indirectly supplements normal actions. On two publicly available datasets, we conduct experiments to test the effectiveness of the proposed method with respect to abovementioned limitations.
翻译:摘要:本研究探索无监督异常动作识别,该方法无需异常样本即可在无监督方式下识别视频级异常人类行为事件,并同时解决传统基于骨骼的方法中的三个局限:依赖目标域训练的深度神经网络、对骨骼误差的鲁棒性不足以及正常样本缺失。我们提出一个统一的、用户提示引导的零样本学习框架,采用独立于目标域的骨骼特征提取器,该提取器在大型动作识别数据集上预训练。特别地,在使用正常样本的训练阶段,该方法在冻结深度神经网络权重的同时对正常动作的骨骼特征分布进行建模,并在推理阶段利用该分布估计异常分数。此外,为增强对骨骼误差的鲁棒性,我们引入一种受点云深度学习范式启发的深度神经网络架构,该架构在关节之间稀疏传播特征。进一步,为防止未观测到的正常动作被误判为异常,我们将用户提示嵌入与公共空间中对齐的骨骼特征之间的相似度分数纳入异常分数,从而间接补充正常动作信息。在两个公开数据集上,针对上述局限,我们开展实验验证所提方法的有效性。