Self-supervised learning (SSL) has proven effective in solving various problems by generating internal supervisory signals. Unsupervised anomaly detection, which faces the high cost of obtaining true labels, is an area that can greatly benefit from SSL. However, recent literature suggests that tuning the hyperparameters (HP) of data augmentation functions is crucial to the success of SSL-based anomaly detection (SSAD), yet a systematic method for doing so remains unknown. In this work, we propose DSV (Discordance and Separability Validation), an unsupervised validation loss to select high-performing detection models with effective augmentation HPs. DSV captures the alignment between an augmentation function and the anomaly-generating mechanism with surrogate losses, which approximate the discordance and separability of test data, respectively. As a result, the evaluation via DSV leads to selecting an effective SSAD model exhibiting better alignment, which results in high detection accuracy. We theoretically derive the degree of approximation conducted by the surrogate losses and empirically show that DSV outperforms a wide range of baselines on 21 real-world tasks.
翻译:摘要:自监督学习通过生成内部监督信号,已被证明能有效解决各类问题。在异常标签获取成本高昂的无监督异常检测领域,自监督学习具有显著应用价值。然而近期研究表明,数据增强函数的超参数调优是决定自监督异常检测方法成败的关键,但目前尚未建立系统化的调优方法。本文提出DSV(不一致性与可分性验证)——一种无监督验证损失函数,旨在筛选具有有效增强超参数的高性能检测模型。DSV通过代理损失函数分别近似测试数据的不一致性与可分性,捕获增强函数与异常生成机制之间的对齐程度。基于DSV的评估可引导选择对齐性更优的自监督异常检测模型,从而实现高检测精度。我们从理论上推导了代理损失函数的近似精度,并通过21项实际任务的实验证明,DSV在多项基准方法中均表现优异。