Anomaly detection is vital in many domains, such as finance, healthcare, and cybersecurity. In this paper, we propose a novel deep anomaly detection method for tabular data that leverages Non-Parametric Transformers (NPTs), a model initially proposed for supervised tasks, to capture both feature-feature and sample-sample dependencies. In a reconstruction-based framework, we train an NPT to reconstruct masked features of normal samples. In a non-parametric fashion, we leverage the whole training set during inference and use the model's ability to reconstruct the masked features to generate an anomaly score. To the best of our knowledge, this is the first work to successfully combine feature-feature and sample-sample dependencies for anomaly detection on tabular datasets. Through extensive experiments on 31 benchmark tabular datasets, we demonstrate that our method achieves state-of-the-art performance, outperforming existing methods by 2.4% and 1.2% in terms of F1-score and AUROC, respectively. Our ablation study further proves that modeling both types of dependencies is crucial for anomaly detection on tabular data.
翻译:异常检测在金融、医疗和网络安全等多个领域至关重要。本文提出了一种针对表格数据的新型深度异常检测方法,利用最初为监督任务设计的非参数Transformer(NPTs)来捕获特征-特征和样本-样本的依赖关系。在基于重构的框架中,我们训练一个NPT来重构正常样本被掩码的特征。采用非参数的方式,我们在推理过程中利用整个训练集,并通过模型重构掩码特征的能力生成异常分数。据我们所知,这是首次成功地将特征-特征和样本-样本依赖关系相结合,用于表格数据集上的异常检测。通过在31个基准表格数据集上的大量实验,我们证明了该方法达到了最先进的性能,在F1分数和AUROC上分别比现有方法高出2.4%和1.2%。我们的消融研究进一步证明,建模这两种类型的依赖关系对于表格数据的异常检测至关重要。