Video Anomaly Detection (VAD) plays a crucial role in modern surveillance systems, aiming to identify various anomalies in real-world situations. However, current benchmark datasets predominantly emphasize simple, single-frame anomalies such as novel object detection. This narrow focus restricts the advancement of VAD models. In this research, we advocate for an expansion of VAD investigations to encompass intricate anomalies that extend beyond conventional benchmark boundaries. To facilitate this, we introduce two datasets, HMDB-AD and HMDB-Violence, to challenge models with diverse action-based anomalies. These datasets are derived from the HMDB51 action recognition dataset. We further present Multi-Frame Anomaly Detection (MFAD), a novel method built upon the AI-VAD framework. AI-VAD utilizes single-frame features such as pose estimation and deep image encoding, and two-frame features such as object velocity. They then apply a density estimation algorithm to compute anomaly scores. To address complex multi-frame anomalies, we add a deep video encoding features capturing long-range temporal dependencies, and logistic regression to enhance final score calculation. Experimental results confirm our assumptions, highlighting existing models limitations with new anomaly types. MFAD excels in both simple and complex anomaly detection scenarios.
翻译:视频异常检测(VAD)在现代监控系统中扮演着关键角色,旨在识别现实场景中的各类异常。然而,当前基准数据集主要侧重于简单的单帧异常,例如新物体检测。这种狭隘的关注限制了VAD模型的发展。本研究主张将VAD研究范围扩展至超越传统基准边界的复杂异常。为此,我们引入两个数据集HMDB-AD和HMDB-Violence,以挑战模型对基于动作的多样异常。这些数据集源自HMDB51动作识别数据集。我们进一步提出多帧异常检测(MFAD),这是一种基于AI-VAD框架的新方法。AI-VAD利用单帧特征(如姿态估计和深度图像编码)及双帧特征(如物体速度),随后应用密度估计算法计算异常分数。为处理复杂的多帧异常,我们新增了捕捉长时序依赖的深度视频编码特征,并采用逻辑回归优化最终分数计算。实验结果证实了我们的假设,揭示了现有模型在新异常类型上的局限性。MFAD在简单与复杂异常检测场景中均表现优异。