We propose a novel Conditional Latent space Variational Autoencoder (CL-VAE) to perform improved pre-processing for anomaly detection on data with known inlier classes and unknown outlier classes. This proposed variational autoencoder (VAE) improves latent space separation by conditioning on information within the data. The method fits a unique prior distribution to each class in the dataset, effectively expanding the classic prior distribution for VAEs to include a Gaussian mixture model. An ensemble of these VAEs are merged in the latent spaces to form a group consensus that greatly improves the accuracy of anomaly detection across data sets. Our approach is compared against the capabilities of a typical VAE, a CNN, and a PCA, with regards AUC for anomaly detection. The proposed model shows increased accuracy in anomaly detection, achieving an AUC of 97.4% on the MNIST dataset compared to 95.7% for the second best model. In addition, the CL-VAE shows increased benefits from ensembling, a more interpretable latent space, and an increased ability to learn patterns in complex data with limited model sizes.
翻译:我们提出了一种新颖的条件隐空间变分自编码器(CL-VAE),用于对已知正常类与未知异常类数据进行改进的异常检测预处理。该变分自编码器(VAE)通过利用数据内部信息进行条件化建模,提升了隐空间的分离效果。该方法为数据集中的每个类别拟合独特的先验分布,从而将经典VAE的先验分布扩展为高斯混合模型。多个此类VAE在隐空间进行集成,形成群体共识机制,显著提升了跨数据集的异常检测准确率。我们将所提方法与典型VAE、CNN及PCA在异常检测的AUC指标上进行比较。实验表明,该模型在异常检测中表现出更高的准确性,在MNIST数据集上达到97.4%的AUC,优于次优模型的95.7%。此外,CL-VAE展现出更强的集成增益效应、更可解释的隐空间表征,以及在有限模型容量下对复杂数据模式更强的学习能力。