Anomalies are samples that significantly deviate from the rest of the data and their detection plays a major role in building machine learning models that can be reliably used in applications such as data-driven design and novelty detection. The majority of existing anomaly detection methods either are exclusively developed for (semi) supervised settings, or provide poor performance in unsupervised applications where there is no training data with labeled anomalous samples. To bridge this research gap, we introduce a robust, efficient, and interpretable methodology based on nonlinear manifold learning to detect anomalies in unsupervised settings. The essence of our approach is to learn a low-dimensional and interpretable latent representation (aka manifold) for all the data points such that normal samples are automatically clustered together and hence can be easily and robustly identified. We learn this low-dimensional manifold by designing a learning algorithm that leverages either a latent map Gaussian process (LMGP) or a deep autoencoder (AE). Our LMGP-based approach, in particular, provides a probabilistic perspective on the learning task and is ideal for high-dimensional applications with scarce data. We demonstrate the superior performance of our approach over existing technologies via multiple analytic examples and real-world datasets.
翻译:异常是指与其余数据显著偏离的样本,其检测在构建可可靠应用于数据驱动设计和新颖性检测等场景的机器学习模型中发挥着重要作用。现有的大多数异常检测方法要么专门针对(半)监督场景开发,要么在缺乏带标注异常样本训练数据的无监督应用中表现不佳。为弥补这一研究空白,我们提出了一种基于非线性流形学习的鲁棒、高效且可解释的方法,用于在无监督场景中检测异常。该方法的核心在于为所有数据点学习一个低维且可解释的潜表示(即流形),使得正常样本自动聚类,从而能够轻松且鲁棒地识别异常。我们通过设计一种学习算法来学习此低维流形,该算法利用潜映射高斯过程(LMGP)或深度自编码器(AE)。特别是基于LMGP的方法,它为学习任务提供了概率视角,特别适用于数据稀缺的高维应用。通过多个分析示例和真实世界数据集,我们证明了该方法相较于现有技术的优越性能。