Video anomaly detection is to determine whether there are any abnormal events, behaviors or objects in a given video, which enables effective and intelligent public safety management. As video anomaly labeling is both time-consuming and expensive, most existing works employ unsupervised or weakly supervised learning methods. This paper focuses on weakly supervised video anomaly detection, in which the training videos are labeled whether or not they contain any anomalies, but there is no information about which frames the anomalies are located. However, the uncertainty of weakly labeled data and the large model size prevent existing methods from wide deployment in real scenarios, especially the resource-limit situations such as edge-computing. In this paper, we develop a lightweight video anomaly detection model. On the one hand, we propose an adaptive instance selection strategy, which is based on the model's current status to select confident instances, thereby mitigating the uncertainty of weakly labeled data and subsequently promoting the model's performance. On the other hand, we design a lightweight multi-level temporal correlation attention module and an hourglass-shaped fully connected layer to construct the model, which can reduce the model parameters to only 0.56\% of the existing methods (e.g. RTFM). Our extensive experiments on two public datasets UCF-Crime and ShanghaiTech show that our model can achieve comparable or even superior AUC score compared to the state-of-the-art methods, with a significantly reduced number of model parameters.
翻译:视频异常检测旨在判断给定视频中是否存在异常事件、行为或物体,从而实现高效智能的公共安全管理。由于视频异常标注既耗时又昂贵,现有研究大多采用无监督或弱监督学习方法。本文聚焦于弱监督视频异常检测,其训练视频仅标注是否包含异常,但未提供异常所在的具体帧位置信息。然而,弱标注数据的不确定性及现有模型规模过大,阻碍了这些方法在真实场景(特别是边缘计算等资源受限环境)中的广泛部署。为此,本文提出一种轻量级视频异常检测模型。一方面,我们设计了一种自适应实例选择策略,该策略根据模型当前状态选择高置信度实例,从而缓解弱标注数据的不确定性并提升模型性能。另一方面,我们构建了包含轻量级多级时序关联注意力模块与沙漏形全连接层的模型架构,将参数量降至现有方法(如RTFM)的仅0.56%。在UCF-Crime和ShanghaiTech两个公开数据集上的大量实验表明,本模型在参数量显著减少的同时,其AUC指标达到了与当前最优方法相当甚至更优的水平。