Unsupervised Anomaly Detection (UAD) is a key data mining problem owing to its wide real-world applications. Due to the complete absence of supervision signals, UAD methods rely on implicit assumptions about anomalous patterns (e.g., scattered/sparsely/densely clustered) to detect anomalies. However, real-world data are complex and vary significantly across different domains. No single assumption can describe such complexity and be valid in all scenarios. This is also confirmed by recent research that shows no UAD method is omnipotent. Based on above observations, instead of searching for a magic universal winner assumption, we seek to design a general UAD Booster (UADB) that empowers any UAD models with adaptability to different data. This is a challenging task given the heterogeneous model structures and assumptions adopted by existing UAD methods. To achieve this, we dive deep into the UAD problem and find that compared to normal data, anomalies (i) lack clear structure/pattern in feature space, thus (ii) harder to learn by model without a suitable assumption, and finally, leads to (iii) high variance between different learners. In light of these findings, we propose to (i) distill the knowledge of the source UAD model to an imitation learner (booster) that holds no data assumption, then (ii) exploit the variance between them to perform automatic correction, and thus (iii) improve the booster over the original UAD model. We use a neural network as the booster for its strong expressive power as a universal approximator and ability to perform flexible post-hoc tuning. Note that UADB is a model-agnostic framework that can enhance heterogeneous UAD models in a unified way. Extensive experiments on over 80 tabular datasets demonstrate the effectiveness of UADB.
翻译:无监督异常检测(UAD)因其在真实世界中的广泛应用而成为关键的数据挖掘问题。由于完全缺乏监督信号,UAD方法依赖关于异常模式的隐式假设(如分散/稀疏/密集聚类)来检测异常。然而,真实数据复杂多样且在不同领域间差异显著,没有单一假设能够描述这种复杂性并在所有场景中有效,这一点也被近期研究证实——没有任何UAD方法无所不能。基于上述观察,我们不再寻找通用的万能获胜假设,而是设计一种通用的UAD增强器(UADB),为任意UAD模型赋予对不同数据的适应性。鉴于现有UAD方法采用异构模型结构和假设,这是一项具有挑战性的任务。为此,我们深入探究UAD问题,发现与正常数据相比,异常点(i)在特征空间中缺乏清晰结构/模式,因此(ii)在没有合适假设的情况下更难被模型学习,最终导致(iii)不同学习器之间存在高方差。基于这些发现,我们提出:(i)将源UAD模型的知识蒸馏到不包含数据假设的模仿学习器(增强器)中,(ii)利用两者之间的方差进行自动校正,从而(iii)提升增强器相对于原始UAD模型的性能。我们采用神经网络作为增强器,因其作为通用逼近器具有强大的表达能力,并能执行灵活的后调优。值得注意的是,UADB是一个模型无关框架,能以统一方式增强异构UAD模型。在超过80个表格数据集上的大量实验证明了UADB的有效性。