Video anomaly detection is an essential yet challenging task in the multimedia community, with promising applications in smart cities and secure communities. Existing methods attempt to learn abstract representations of regular events with statistical dependence to model the endogenous normality, which discriminates anomalies by measuring the deviations to the learned distribution. However, conventional representation learning is only a crude description of video normality and lacks an exploration of its underlying causality. The learned statistical dependence is unreliable for diverse regular events in the real world and may cause high false alarms due to overgeneralization. Inspired by causal representation learning, we think that there exists a causal variable capable of adequately representing the general patterns of regular events in which anomalies will present significant variations. Therefore, we design a causality-inspired representation consistency (CRC) framework to implicitly learn the unobservable causal variables of normality directly from available normal videos and detect abnormal events with the learned representation consistency. Extensive experiments show that the causality-inspired normality is robust to regular events with label-independent shifts, and the proposed CRC framework can quickly and accurately detect various complicated anomalies from real-world surveillance videos.
翻译:视频异常检测是多媒体领域中一项重要且具有挑战性的任务,在智慧城市和安防社区中具有广阔的应用前景。现有方法试图通过统计依赖性学习常规事件的抽象表示,以建模内源性正常性,并通过测量与所学分布的偏差来区分异常。然而,传统的表示学习仅是对视频正常性的粗略描述,缺乏对其潜在因果关系的探索。所学到的统计依赖性对于现实世界中多样化的常规事件而言并不可靠,且可能因过度泛化导致高误报率。受因果表示学习的启发,我们认为存在一个能够充分表示常规事件通用模式的因果变量,在这些模式中异常会呈现显著变化。因此,我们设计了一种因果启发式表示一致性(CRC)框架,直接从可获得的正常视频中隐式学习正常性的不可观测因果变量,并利用所学到的表示一致性检测异常事件。大量实验表明,因果启发的正常性对于具有标签无关偏移的常规事件具有鲁棒性,所提出的CRC框架能够从真实监控视频中快速准确地检测各种复杂异常。