Weakly Supervised Video Anomaly Detection (WSVAD) is challenging because the binary anomaly label is only given on the video level, but the output requires snippet-level predictions. So, Multiple Instance Learning (MIL) is prevailing in WSVAD. However, MIL is notoriously known to suffer from many false alarms because the snippet-level detector is easily biased towards the abnormal snippets with simple context, confused by the normality with the same bias, and missing the anomaly with a different pattern. To this end, we propose a new MIL framework: Unbiased MIL (UMIL), to learn unbiased anomaly features that improve WSVAD. At each MIL training iteration, we use the current detector to divide the samples into two groups with different context biases: the most confident abnormal/normal snippets and the rest ambiguous ones. Then, by seeking the invariant features across the two sample groups, we can remove the variant context biases. Extensive experiments on benchmarks UCF-Crime and TAD demonstrate the effectiveness of our UMIL. Our code is provided at https://github.com/ktr-hubrt/UMIL.
翻译:弱监督视频异常检测(WSVAD)具有挑战性,因为二进制异常标签仅提供在视频级别,但输出需要片段级别的预测。因此,多实例学习(MIL)在WSVAD中普遍应用。然而,众所周知,MIL容易产生大量误报警,因为片段级检测器容易偏向具有简单上下文的异常片段,被具有相同偏差的正常性混淆,并遗漏具有不同模式的异常。为此,我们提出了一种新的MIL框架:无偏MIL(UMIL),以学习无偏异常特征,从而改进WSVAD。在每个MIL训练迭代中,我们使用当前检测器将样本分为两组,每组具有不同的上下文偏差:最置信的异常/正常片段和其余模糊片段。然后,通过寻找两组样本之间的不变特征,我们可以移除变化的上下文偏差。在基准数据集UCF-Crime和TAD上的大量实验证明了我们UMIL的有效性。我们的代码已在 https://github.com/ktr-hubrt/UMIL 提供。