Probabilistic mixture models are acknowledged as a valuable tool for unsupervised outlier detection owing to their interpretability and intuitive grounding in statistical principles. Within this framework, Dirichlet process mixture models emerge as a compelling alternative to conventional finite mixture models for both clustering and outlier detection tasks. However, despite their evident advantages, the widespread adoption of Dirichlet process mixture models in unsupervised outlier detection has been hampered by challenges related to computational inefficiency and sensitivity to outliers during the construction of detectors. To tackle these challenges, we propose a novel outlier detection method based on ensembles of Dirichlet process Gaussian mixtures. The proposed method is a fully unsupervised algorithm that capitalizes on random subspace and subsampling ensembles, not only ensuring efficient computation but also enhancing the robustness of the resulting outlier detector. Moreover, the proposed method leverages variational inference for Dirichlet process mixtures to ensure efficient and fast computation. Empirical studies with benchmark datasets demonstrate that our method outperforms existing approaches for unsupervised outlier detection.
翻译:概率混合模型因其可解释性和统计原理的直观基础,被公认为无监督异常检测的重要工具。在此框架下,狄利克雷过程混合模型成为传统有限混合模型在聚类与异常检测任务中的有力替代方案。然而,尽管具有显著优势,狄利克雷过程混合模型在无监督异常检测中的广泛应用仍受到计算效率低下及检测器构建过程中对异常值敏感等问题的制约。针对这些挑战,我们提出了一种基于狄利克雷过程高斯混合模型集成的新型异常检测方法。该方法是一种完全无监督的算法,利用随机子空间与子采样集成,不仅确保了计算效率,还增强了所得异常检测器的鲁棒性。此外,该方法采用狄利克雷过程混合模型的变分推断以实现高效快速的计算。基准数据集的实验研究表明,我们的方法在无监督异常检测方面优于现有方法。