Self-Supervised Masked Convolutional Transformer Block for Anomaly Detection

Anomaly detection has recently gained increasing attention in the field of computer vision, likely due to its broad set of applications ranging from product fault detection on industrial production lines and impending event detection in video surveillance to finding lesions in medical scans. Regardless of the domain, anomaly detection is typically framed as a one-class classification task, where the learning is conducted on normal examples only. An entire family of successful anomaly detection methods is based on learning to reconstruct masked normal inputs (e.g. patches, future frames, etc.) and exerting the magnitude of the reconstruction error as an indicator for the abnormality level. Unlike other reconstruction-based methods, we present a novel self-supervised masked convolutional transformer block (SSMCTB) that comprises the reconstruction-based functionality at a core architectural level. The proposed self-supervised block is extremely flexible, enabling information masking at any layer of a neural network and being compatible with a wide range of neural architectures. In this work, we extend our previous self-supervised predictive convolutional attentive block (SSPCAB) with a 3D masked convolutional layer, a transformer for channel-wise attention, as well as a novel self-supervised objective based on Huber loss. Furthermore, we show that our block is applicable to a wider variety of tasks, adding anomaly detection in medical images and thermal videos to the previously considered tasks based on RGB images and surveillance videos. We exhibit the generality and flexibility of SSMCTB by integrating it into multiple state-of-the-art neural models for anomaly detection, bringing forth empirical results that confirm considerable performance improvements on five benchmarks. We release our code and data as open source at: https://github.com/ristea/ssmctb.

翻译：异常检测近年来在计算机视觉领域日益受到关注，这很可能源于其广泛的应用场景，包括工业生产线上产品故障检测、视频监控中异常事件检测以及医学扫描图像中的病灶发现。无论应用领域如何，异常检测通常被构建为单类分类任务，即仅基于正常样本进行学习。一类成功的异常检测方法基于学习重构被掩码的正常输入（如图像块、未来帧等），并将重构误差的大小作为异常程度的指标。与其他基于重构的方法不同，我们提出了一种新颖的自监督掩码卷积Transformer块（SSMCTB），该模块在核心架构层面内置了基于重构的功能。所提出的自监督块具有极高的灵活性，能够在神经网络的任意层实现信息掩码，并与多种神经网络架构兼容。在本研究中，我们扩展了此前提出的自监督预测卷积注意力块（SSPCAB），引入了3D掩码卷积层、用于通道级注意力的Transformer，以及基于Huber损失的新型自监督目标函数。此外，我们证明该模块适用于更广泛的任务，除了此前基于RGB图像和监控视频的任务外，还新增了医学图像和热成像视频中的异常检测。通过将SSMCTB集成到多个最先进的异常检测神经模型中，我们展示了其通用性和灵活性，实验结果表明在五个基准数据集上均取得了显著的性能提升。我们已在以下地址开源发布代码与数据：https://github.com/ristea/ssmctb。