UADB: Unsupervised Anomaly Detection Booster

Unsupervised Anomaly Detection (UAD) is a key data mining problem owing to its wide real-world applications. Due to the complete absence of supervision signals, UAD methods rely on implicit assumptions about anomalous patterns (e.g., scattered/sparsely/densely clustered) to detect anomalies. However, real-world data are complex and vary significantly across different domains. No single assumption can describe such complexity and be valid in all scenarios. This is also confirmed by recent research that shows no UAD method is omnipotent. Based on above observations, instead of searching for a magic universal winner assumption, we seek to design a general UAD Booster (UADB) that empowers any UAD models with adaptability to different data. This is a challenging task given the heterogeneous model structures and assumptions adopted by existing UAD methods. To achieve this, we dive deep into the UAD problem and find that compared to normal data, anomalies (i) lack clear structure/pattern in feature space, thus (ii) harder to learn by model without a suitable assumption, and finally, leads to (iii) high variance between different learners. In light of these findings, we propose to (i) distill the knowledge of the source UAD model to an imitation learner (booster) that holds no data assumption, then (ii) exploit the variance between them to perform automatic correction, and thus (iii) improve the booster over the original UAD model. We use a neural network as the booster for its strong expressive power as a universal approximator and ability to perform flexible post-hoc tuning. Note that UADB is a model-agnostic framework that can enhance heterogeneous UAD models in a unified way. Extensive experiments on over 80 tabular datasets demonstrate the effectiveness of UADB.

翻译：无监督异常检测（UAD）因其在真实世界中的广泛应用而成为关键的数据挖掘问题。由于完全缺乏监督信号，UAD方法依赖关于异常模式的隐式假设（如分散/稀疏/密集聚类）来检测异常。然而，真实数据复杂多样且在不同领域间差异显著，没有单一假设能够描述这种复杂性并在所有场景中有效，这一点也被近期研究证实——没有任何UAD方法无所不能。基于上述观察，我们不再寻找通用的万能获胜假设，而是设计一种通用的UAD增强器（UADB），为任意UAD模型赋予对不同数据的适应性。鉴于现有UAD方法采用异构模型结构和假设，这是一项具有挑战性的任务。为此，我们深入探究UAD问题，发现与正常数据相比，异常点（i）在特征空间中缺乏清晰结构/模式，因此（ii）在没有合适假设的情况下更难被模型学习，最终导致（iii）不同学习器之间存在高方差。基于这些发现，我们提出：（i）将源UAD模型的知识蒸馏到不包含数据假设的模仿学习器（增强器）中，（ii）利用两者之间的方差进行自动校正，从而（iii）提升增强器相对于原始UAD模型的性能。我们采用神经网络作为增强器，因其作为通用逼近器具有强大的表达能力，并能执行灵活的后调优。值得注意的是，UADB是一个模型无关框架，能以统一方式增强异构UAD模型。在超过80个表格数据集上的大量实验证明了UADB的有效性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日