A Background-Agnostic Framework with Adversarial Training for Abnormal Event Detection in Video

Abnormal event detection in video is a complex computer vision problem that has attracted significant attention in recent years. The complexity of the task arises from the commonly-adopted definition of an abnormal event, that is, a rarely occurring event that typically depends on the surrounding context. Following the standard formulation of abnormal event detection as outlier detection, we propose a background-agnostic framework that learns from training videos containing only normal events. Our framework is composed of an object detector, a set of appearance and motion auto-encoders, and a set of classifiers. Since our framework only looks at object detections, it can be applied to different scenes, provided that normal events are defined identically across scenes and that the single main factor of variation is the background. To overcome the lack of abnormal data during training, we propose an adversarial learning strategy for the auto-encoders. We create a scene-agnostic set of out-of-domain pseudo-abnormal examples, which are correctly reconstructed by the auto-encoders before applying gradient ascent on the pseudo-abnormal examples. We further utilize the pseudo-abnormal examples to serve as abnormal examples when training appearance-based and motion-based binary classifiers to discriminate between normal and abnormal latent features and reconstructions. We compare our framework with the state-of-the-art methods on four benchmark data sets, using various evaluation metrics. Compared to existing methods, the empirical results indicate that our approach achieves favorable performance on all data sets. In addition, we provide region-based and track-based annotations for two large-scale abnormal event detection data sets from the literature, namely ShanghaiTech and Subway.

翻译：视频中的异常事件检测是一个近年来备受关注的复杂计算机视觉问题。该任务的复杂性源于异常事件的常见定义，即一种通常依赖于周围上下文的罕见事件。遵循异常事件检测作为离群点检测的标准范式，我们提出了一种背景无关框架，该框架仅从包含正常事件的训练视频中学习。我们的框架由目标检测器、一组外观和运动自编码器以及一组分类器组成。由于框架仅关注目标检测结果，因此可适用于不同场景，前提是正常事件的定义在各场景中保持一致，且唯一的变量主要源于背景。为克服训练过程中异常数据缺乏的问题，我们提出了一种针对自编码器的对抗学习策略。具体地，我们构建了一组场景无关的域外伪异常样本，并通过对伪异常样本应用梯度上升之前，使自编码器能够正确重建这些样本。此外，我们利用伪异常样本作为异常样本，训练基于外观和运动的二元分类器，以区分正常与异常的潜在特征及重建结果。我们在四个基准数据集上采用多种评估指标，将所提框架与现有最优方法进行了比较。实验结果表明，与现有方法相比，我们的方法在所有数据集上均取得了优异的性能。同时，我们为文献中的两个大规模异常事件检测数据集（即ShanghaiTech和Subway）提供了基于区域和基于轨迹的标注。