In this paper, we propose a new architecture for real-time anomaly detection in video data, inspired by human behavior combining spatial and temporal analyses. This approach uses two distinct models: (i) for temporal analysis, a recurrent convolutional network (CNN + RNN) is employed, associating VGG19 and a GRU to process video sequences; (ii) regarding spatial analysis, it is performed using YOLOv7 to analyze individual images. These two analyses can be carried out either in parallel, with a final prediction that combines the results of both analysis, or in series, where the spatial analysis enriches the data before the temporal analysis. Some experimentations are been made to compare these two architectural configurations with each other, and evaluate the effectiveness of our hybrid approach in video anomaly detection.
翻译:本文提出一种用于视频数据实时异常检测的新架构,其灵感来源于人类结合空间与时间分析的行为模式。该方法采用两种独立模型:(i) 时间分析方面,使用循环卷积网络(CNN + RNN),结合VGG19与GRU处理视频序列;(ii) 空间分析方面,采用YOLOv7分析单帧图像。这两种分析可以并行执行——通过综合两种分析结果得出最终预测,也可以串联执行——空间分析先对数据进行增强再输入时间分析。我们通过实验比较了这两种架构配置,并评估了混合方法在视频异常检测中的有效性。