In this study, we formulate the task of Video Anomaly Detection as a probabilistic analysis of object bounding boxes. We hypothesize that the representation of objects via their bounding boxes only, can be sufficient to successfully identify anomalous events in a scene. The implied value of this approach is increased object anonymization, faster model training and fewer computational resources. This can particularly benefit applications within video surveillance running on edge devices such as cameras. We design our model based on human reasoning which lends itself to explaining model output in human-understandable terms. Meanwhile, the slowest model trains within less than 7 seconds on a 11th Generation Intel Core i9 Processor. While our approach constitutes a drastic reduction of problem feature space in comparison with prior art, we show that this does not result in a reduction in performance: the results we report are highly competitive on the benchmark datasets CUHK Avenue and ShanghaiTech, and significantly exceed on the latest State-of-the-Art results on StreetScene, which has so far proven to be the most challenging VAD dataset.
翻译:在本研究中,我们将视频异常检测任务形式化为对物体边界框的概率分析。我们假设仅通过边界框表示物体,便足以成功识别场景中的异常事件。该方法隐含的价值在于增强物体匿名性、加快模型训练速度并减少计算资源消耗。这对于在摄像头等边缘设备上运行的视频监控应用尤为有益。我们基于人类推理设计模型,使其能够以人类可理解的术语解释模型输出。同时,最慢的模型在第11代英特尔酷睿i9处理器上的训练时间不足7秒。尽管相较于现有技术,我们的方法实现了问题特征空间的急剧压缩,但我们证明这并未导致性能下降:我们在基准数据集CUHK Avenue和ShanghaiTech上报告的结果具有高度竞争力,并在目前最具挑战性的VAD数据集StreetScene上显著超越了最新的最先进结果。