Video anomaly detection is an essential but challenging task. The prevalent methods mainly investigate the reconstruction difference between normal and abnormal patterns but ignore the semantics consistency between appearance and motion information of behavior patterns, making the results highly dependent on the local context of frame sequences and lacking the understanding of behavior semantics. To address this issue, we propose a framework of Appearance-Motion Semantics Representation Consistency that uses the gap of appearance and motion semantic representation consistency between normal and abnormal data. The two-stream structure is designed to encode the appearance and motion information representation of normal samples, and a novel consistency loss is proposed to enhance the consistency of feature semantics so that anomalies with low consistency can be identified. Moreover, the lower consistency features of anomalies can be used to deteriorate the quality of the predicted frame, which makes anomalies easier to spot. Experimental results demonstrate the effectiveness of the proposed method.
翻译:视频异常检测是一项重要但具有挑战性的任务。现有方法主要研究正常与异常模式之间的重建差异,但忽略了行为模式中外观与运动信息之间的语义一致性,导致结果高度依赖于帧序列的局部上下文,缺乏对行为语义的理解。针对这一问题,我们提出了一种外观-运动语义表示一致性框架,利用正常与异常数据之间外观与运动语义表示一致性的差异。设计双流结构以编码正常样本的外观与运动信息表示,并提出一种新颖的一致性损失来增强特征语义的一致性,从而能够识别低一致性的异常。此外,异常的较低一致性特征可用于降低预测帧的质量,使得异常更易被发现。实验结果验证了所提方法的有效性。