We propose a new method for novelty detection that can tolerate high corruption of the training points, whereas previous works assumed either no or very low corruption. Our method trains a robust variational autoencoder (VAE), which aims to generate a model for the uncorrupted training points. To gain robustness to high corruption, we incorporate the following four changes to the common VAE: 1. Extracting crucial features of the latent code by a carefully designed dimension reduction component for distributions; 2. Modeling the latent distribution as a mixture of Gaussian low-rank inliers and full-rank outliers, where the testing only uses the inlier model; 3. Applying the Wasserstein-1 metric for regularization, instead of the Kullback-Leibler (KL) divergence; and 4. Using a robust error for reconstruction. We establish both robustness to outliers and suitability to low-rank modeling of the Wasserstein metric as opposed to the KL divergence. We illustrate state-of-the-art results on standard benchmarks.
翻译:我们提出了一种新的异常检测方法,该方法能够容忍训练数据中的高污染率,而以往的工作假设污染率极低或为零。我们的方法训练一个鲁棒变分自编码器(VAE),旨在为未受污染的训练点构建模型。为在高度污染下保持鲁棒性,我们对标准VAE进行了以下四项改进:1. 通过精心设计的分布降维组件提取潜在编码的关键特征;2. 将潜在分布建模为高斯低秩内点与全秩异常点的混合模型,其中测试仅使用内点模型;3. 采用Wasserstein-1度量进行正则化,替代Kullback-Leibler(KL)散度;4. 使用鲁棒误差进行重构。我们证明了Wasserstein度量相较于KL散度兼具对异常点的鲁棒性及对低秩建模的适用性,并在标准基准测试中展示了最先进的性能结果。