We propose a very fast frame-level model for anomaly detection in video, which learns to detect anomalies by distilling knowledge from multiple highly accurate object-level teacher models. To improve the fidelity of our student, we distill the low-resolution anomaly maps of the teachers by jointly applying standard and adversarial distillation, introducing an adversarial discriminator for each teacher to distinguish between target and generated anomaly maps. We conduct experiments on three benchmarks (Avenue, ShanghaiTech, UCSD Ped2), showing that our method is over 7 times faster than the fastest competing method, and between 28 and 62 times faster than object-centric models, while obtaining comparable results to recent methods. Our evaluation also indicates that our model achieves the best trade-off between speed and accuracy, due to its previously unheard-of speed of 1480 FPS. In addition, we carry out a comprehensive ablation study to justify our architectural design choices. Our code is freely available at: https://github.com/ristea/fast-aed.
翻译:本文提出了一种极快的帧级视频异常检测模型,该模型通过从多个高精度物体级教师模型中蒸馏知识来学习异常检测。为提高学生模型的保真度,我们联合应用标准蒸馏与对抗蒸馏来迁移教师的低分辨率异常图,为每位教师引入对抗判别器以区分目标异常图与生成异常图。我们在三个基准数据集(Avenue、ShanghaiTech、UCSD Ped2)上进行实验,结果表明本方法比最快的竞争方法快7倍以上,比以物体为中心的模型快28至62倍,同时获得与最新方法相当的结果。评估还表明,由于模型达到1480 FPS的前所未有的速度,其在速度与精度间实现了最佳权衡。此外,我们进行了全面的消融研究以验证架构设计的合理性。代码已开源:https://github.com/ristea/fast-aed。