This paper investigates the performance of diffusion models for video anomaly detection (VAD) within the most challenging but also the most operational scenario in which the data annotations are not used. As being sparse, diverse, contextual, and often ambiguous, detecting abnormal events precisely is a very ambitious task. To this end, we rely only on the information-rich spatio-temporal data, and the reconstruction power of the diffusion models such that a high reconstruction error is utilized to decide the abnormality. Experiments performed on two large-scale video anomaly detection datasets demonstrate the consistent improvement of the proposed method over the state-of-the-art generative models while in some cases our method achieves better scores than the more complex models. This is the first study using a diffusion model and examining its parameters' influence to present guidance for VAD in surveillance scenarios.
翻译:本文研究了扩散模型在最具挑战性但同时也是最实用的无监督场景下的视频异常检测性能。由于异常事件具有稀疏性、多样性、上下文依赖性以及模糊性,精确检测异常事件是一项极具挑战性的任务。为此,我们仅依赖信息丰富的时空数据以及扩散模型的重构能力,通过高重构误差来判断异常。在两个大规模视频异常检测数据集上的实验表明,所提方法相较于当前最先进的生成模型具有持续性的性能提升,且在某些场景下取得了优于更复杂模型的效果。本研究首次将扩散模型应用于视频异常检测,并通过分析其参数影响为监控场景下的异常检测提供指导。