Self-supervised learning (SSL) has emerged as a promising alternative to create supervisory signals to real-world problems, avoiding the extensive cost of manual labeling. SSL is particularly attractive for unsupervised tasks such as anomaly detection (AD), where labeled anomalies are rare or often nonexistent. A large catalog of augmentation functions has been used for SSL-based AD (SSAD) on image data, and recent works have reported that the type of augmentation has a significant impact on accuracy. Motivated by those, this work sets out to put image-based SSAD under a larger lens and investigate the role of data augmentation in SSAD. Through extensive experiments on 3 different detector models and across 420 AD tasks, we provide comprehensive numerical and visual evidences that the alignment between data augmentation and anomaly-generating mechanism is the key to the success of SSAD, and in the lack thereof, SSL may even impair accuracy. To the best of our knowledge, this is the first meta-analysis on the role of data augmentation in SSAD.
翻译:自监督学习(SSL)已成为一种有前景的替代方案,可为现实世界问题创建监督信号,从而避免人工标注的高昂成本。SSL对于无监督任务(如异常检测)尤为有吸引力,因为在该任务中标注异常样本极为稀少甚至不存在。大量增强函数已被用于基于SSL的图像异常检测任务,近期研究表明增强类型对精度具有显著影响。受此启发,本研究致力于更深入地审视基于图像的SSL异常检测方法,探究数据增强在其中的作用。通过在3种不同的检测器模型上开展420项异常检测任务的广泛实验,我们提供了全面的数值和可视化证据,证明数据增强与异常生成机制之间的对齐是SSL异常检测成功的关键,缺乏这种对齐甚至可能导致精度下降。据我们所知,这是首次对数据增强在SSL异常检测中作用进行的元分析研究。