Human behavior anomaly detection aims to identify unusual human actions, playing a crucial role in intelligent surveillance and other areas. The current mainstream methods still adopt reconstruction or future frame prediction techniques. However, reconstructing or predicting low-level pixel features easily enables the network to achieve overly strong generalization ability, allowing anomalies to be reconstructed or predicted as effectively as normal data. Different from their methods, inspired by the Student-Teacher Network, we propose a novel framework called the Multilevel Guidance-Exploration Network(MGENet), which detects anomalies through the difference in high-level representation between the Guidance and Exploration network. Specifically, we first utilize the pre-trained Normalizing Flow that takes skeletal keypoints as input to guide an RGB encoder, which takes unmasked RGB frames as input, to explore motion latent features. Then, the RGB encoder guides the mask encoder, which takes masked RGB frames as input, to explore the latent appearance feature. Additionally, we design a Behavior-Scene Matching Module(BSMM) to detect scene-related behavioral anomalies. Extensive experiments demonstrate that our proposed method achieves state-of-the-art performance on ShanghaiTech and UBnormal datasets, with AUC of 86.9 % and 73.5 %, respectively. The code will be available on https://github.com/molu-ggg/GENet.
翻译:人类行为异常检测旨在识别异常的人类动作,在智能监控及其他领域发挥着重要作用。当前主流方法仍采用重构或未来帧预测技术。然而,重构或预测低层次像素特征容易使网络获得过强的泛化能力,导致异常数据能够像正常数据一样被有效重构或预测。与现有方法不同,受学生-教师网络的启发,我们提出了一种名为多级引导探索网络(MGENet)的新型框架,通过引导网络与探索网络在高层次表征上的差异来检测异常。具体而言,我们首先利用以骨骼关键点为输入的预训练归一化流,引导以未遮挡RGB帧为输入的RGB编码器探索运动潜在特征;随后,RGB编码器引导以遮挡RGB帧为输入的掩码编码器探索外观潜在特征。此外,我们设计了一个行为-场景匹配模块(BSMM),用于检测与场景相关的行为异常。大量实验表明,所提方法在上海科技大学和UBnormal数据集上达到了最先进性能,AUC分别为86.9%和73.5%。代码将发布于https://github.com/molu-ggg/GENet。