We propose a very fast frame-level model for anomaly detection in video, which learns to detect anomalies by distilling knowledge from multiple highly accurate object-level teacher models. To improve the fidelity of our student, we distill the low-resolution anomaly maps of the teachers by jointly applying standard and adversarial distillation, introducing an adversarial discriminator for each teacher to distinguish between target and generated anomaly maps. We conduct experiments on three benchmarks (Avenue, ShanghaiTech, UCSD Ped2), showing that our method is over 7 times faster than the fastest competing method, and between 28 and 62 times faster than object-centric models, while obtaining comparable results to recent methods. Our evaluation also indicates that our model achieves the best trade-off between speed and accuracy, due to its previously unheard-of speed of 1480 FPS. In addition, we carry out a comprehensive ablation study to justify our architectural design choices. Our code is freely available at: https://github.com/ristea/fast-aed.
翻译:本文提出了一种极快的帧级视频异常检测模型,该模型通过从多个高精度物体级教师模型中蒸馏知识来学习检测异常。为提升学生模型的保真度,我们联合应用标准蒸馏与对抗蒸馏来迁移教师的低分辨率异常热图,为每位教师引入对抗判别器以区分目标异常热图与生成异常热图。我们在三个基准数据集(Avenue、ShanghaiTech、UCSD Ped2)上开展实验,结果表明:本方法比最快的竞争方法提速超过7倍,比以物体为中心的模型提速28至62倍,同时获得与前沿方法相当的性能。评估结果还显示,由于模型达到前所未有的1480 FPS检测速度,其在速度与精度间实现了最佳平衡。此外,我们进行了全面的消融实验以验证所提出的架构设计。代码已开源:https://github.com/ristea/fast-aed。