In this paper, we explore a weakly supervised method for anomaly detection. Since annotating videos is time-consuming, we only look at weak video-level labels during training. This means that given a video, we know that it is either normal or contains an anomaly, but no further annotations are used to train the network. Features are extracted from video clips that are either normal or anomalous. These features are used to determine anomaly scores for spatiotemporal regions of the clips based on a classifier and the implementation of a multiple instance ranking loss (MIL). We represent both anomalous and normal video clips as positive and negative bags, respectively, to apply MIL. Furthermore, since anomalies are usually localized to a part of a frame rather than the whole frame, we chose to explore temporal as well as spatial anomaly detection. We show our results on the UCF Crime2Local Dataset, which contains spatiotemporal annotations for a portion of the UCF Crime Dataset.
翻译:本文探索了一种弱监督异常检测方法。由于视频标注耗时,我们在训练过程中仅使用弱视频级标签。这意味着在给定视频的情况下,我们仅知道该视频是正常还是包含异常,而无需使用进一步的标注来训练网络。我们从正常或异常视频片段中提取特征,基于分类器及多实例排序损失(MIL)的实现,利用这些特征确定片段时空区域的异常分数。我们将异常和正常视频片段分别表示为正包和负包,以应用MIL。此外,由于异常通常局限于帧的一部分而非整帧,我们选择探索时间及空间异常检测。我们在UCF Crime2Local数据集上展示了实验结果,该数据集包含UCF Crime数据集部分视频的时空标注。