Self-supervised learning (SSL) has emerged as a promising alternative to create supervisory signals to real-world problems, avoiding the extensive cost of manual labeling. SSL is particularly attractive for unsupervised tasks such as anomaly detection (AD), where labeled anomalies are rare or often nonexistent. A large catalog of augmentation functions has been used for SSL-based AD (SSAD) on image data, and recent works have reported that the type of augmentation has a significant impact on accuracy. Motivated by those, this work sets out to put image-based SSAD under a larger lens and investigate the role of data augmentation in SSAD. Through extensive experiments on 3 different detector models and across 420 AD tasks, we provide comprehensive numerical and visual evidences that the alignment between data augmentation and anomaly-generating mechanism is the key to the success of SSAD, and in the lack thereof, SSL may even impair accuracy. To the best of our knowledge, this is the first meta-analysis on the role of data augmentation in SSAD.
翻译:自监督学习(SSL)已成为为现实问题创建监督信号的一种有前景的替代方案,避免了手动标记的昂贵成本。SSL对于异常检测(AD)等无监督任务尤其具有吸引力,因为在这些任务中,标记异常数据很少,甚至常常不存在。大量增强函数已被用于基于SSL的AD(SSAD)处理图像数据,近期研究表明,增强类型对准确性有显著影响。受此启发,本研究旨在以更广阔的视角审视基于图像的SSAD,并探究数据增强在SSAD中的作用。通过在3种不同的检测器模型和420个AD任务上进行广泛实验,我们提供了全面的数值和可视化证据,表明数据增强与异常生成机制的对齐是SSAD成功的关键,而在缺乏这种对齐的情况下,SSL甚至可能损害准确性。据我们所知,这是首次对数据增强在SSAD中作用进行的元分析。