This paper investigates the performance of diffusion models for video anomaly detection (VAD) within the most challenging but also the most operational scenario in which the data annotations are not used. As being sparse, diverse, contextual, and often ambiguous, detecting abnormal events precisely is a very ambitious task. To this end, we rely only on the information-rich spatio-temporal data, and the reconstruction power of the diffusion models such that a high reconstruction error is utilized to decide the abnormality. Experiments performed on two large-scale video anomaly detection datasets demonstrate the consistent improvement of the proposed method over the state-of-the-art generative models while in some cases our method achieves better scores than the more complex models. This is the first study using a diffusion model and examining its parameters' influence to present guidance for VAD in surveillance scenarios.
翻译:本文研究了在最具挑战性但也是最实用的场景下(即不使用数据标注时)扩散模型在视频异常检测(VAD)中的性能。由于异常事件具有稀疏性、多样性、上下文依赖性以及常模糊不清的特点,精确检测异常事件是一项极具挑战性的任务。为此,我们仅依赖信息丰富的时空数据以及扩散模型的重构能力,通过高重构误差来判定异常。在两个大规模视频异常检测数据集上进行的实验表明,所提方法相较于当前最先进的生成模型持续取得改进,在某些情况下甚至取得了优于更复杂模型的分数。本研究是首项采用扩散模型研究并分析其参数影响以指导监控场景下视频异常检测的工作。