With a focus on abnormal events contained within untrimmed videos, there is increasing interest among researchers in video anomaly detection. Among different video anomaly detection scenarios, weakly-supervised video anomaly detection poses a significant challenge as it lacks frame-wise labels during the training stage, only relying on video-level labels as coarse supervision. Previous methods have made attempts to either learn discriminative features in an end-to-end manner or employ a twostage self-training strategy to generate snippet-level pseudo labels. However, both approaches have certain limitations. The former tends to overlook informative features at the snippet level, while the latter can be susceptible to noises. In this paper, we propose an Anomalous Attention mechanism for weakly-supervised anomaly detection to tackle the aforementioned problems. Our approach takes into account snippet-level encoded features without the supervision of pseudo labels. Specifically, our approach first generates snippet-level anomalous attention and then feeds it together with original anomaly scores into a Multi-branch Supervision Module. The module learns different areas of the video, including areas that are challenging to detect, and also assists the attention optimization. Experiments on benchmark datasets XDViolence and UCF-Crime verify the effectiveness of our method. Besides, thanks to the proposed snippet-level attention, we obtain a more precise anomaly localization.
翻译:摘要:聚焦于未修剪视频中包含的异常事件,视频异常检测正引起研究者日益浓厚的兴趣。在不同视频异常检测场景中,弱监督视频异常检测因训练阶段缺乏帧级标签,仅依赖视频级标签作为粗粒度监督而面临重大挑战。现有方法尝试以端到端方式学习判别性特征,或采用两阶段自训练策略生成片段级伪标签,但两种方法均存在局限:前者易忽视片段级信息性特征,后者则易受噪声干扰。本文提出一种用于弱监督异常检测的异常注意力机制以解决上述问题。该方法无需伪标签监督即可考虑片段级编码特征。具体而言,本方法首先生成片段级异常注意力,随后将其与原始异常分数共同输入多分支监督模块。该模块不仅学习视频中包括难检测区域在内的不同区域,还可辅助注意力优化。在基准数据集XDViolence和UCF-Crime上的实验验证了本方法的有效性。此外,得益于所提出的片段级注意力机制,我们实现了更精确的异常定位。