We propose a novel approach to video anomaly detection: we treat feature vectors extracted from videos as realizations of a random variable with a fixed distribution and model this distribution with a neural network. This lets us estimate the likelihood of test videos and detect video anomalies by thresholding the likelihood estimates. We train our video anomaly detector using a modification of denoising score matching, a method that injects training data with noise to facilitate modeling its distribution. To eliminate hyperparameter selection, we model the distribution of noisy video features across a range of noise levels and introduce a regularizer that tends to align the models for different levels of noise. At test time, we combine anomaly indications at multiple noise scales with a Gaussian mixture model. Running our video anomaly detector induces minimal delays as inference requires merely extracting the features and forward-propagating them through a shallow neural network and a Gaussian mixture model. Our experiments on five popular video anomaly detection benchmarks demonstrate state-of-the-art performance, both in the object-centric and in the frame-centric setup.
翻译:我们提出了一种视频异常检测的新方法:将从视频中提取的特征向量视为具有固定分布的随机变量的实现,并通过神经网络对该分布进行建模。这使得我们能够估计测试视频的似然性,并通过阈值化似然估计来检测视频异常。我们采用改进的去噪分数匹配方法训练视频异常检测器,该方法通过向训练数据注入噪声以促进对数据分布建模。为消除超参数选择,我们对不同噪声水平下的带噪视频特征分布进行建模,并引入一个正则化器,使其倾向于对齐不同噪声水平下的模型。在测试阶段,我们通过高斯混合模型结合多个噪声尺度下的异常指示。运行视频异常检测器仅需提取特征,并通过浅层神经网络和高斯混合模型进行前向传播,因此仅引入极小的延迟。我们在五个主流视频异常检测基准上的实验表明,无论在对象中心还是帧中心设置中,该方法均达到了当前最优性能。