Probabilistic mixture models are recognized as effective tools for unsupervised outlier detection owing to their interpretability and global characteristics. Among these, Dirichlet process mixture models stand out as a strong alternative to conventional finite mixture models for both clustering and outlier detection tasks. Unlike finite mixture models, Dirichlet process mixtures are infinite mixture models that automatically determine the number of mixture components based on the data. Despite their advantages, the adoption of Dirichlet process mixture models for unsupervised outlier detection has been limited by challenges related to computational inefficiency and sensitivity to outliers in the construction of outlier detectors. Additionally, Dirichlet process Gaussian mixtures struggle to effectively model non-Gaussian data with discrete or binary features. To address these challenges, we propose a novel outlier detection method that utilizes ensembles of Dirichlet process Gaussian mixtures. This unsupervised algorithm employs random subspace and subsampling ensembles to ensure efficient computation and improve the robustness of the outlier detector. The ensemble approach further improves the suitability of the proposed method for detecting outliers in non-Gaussian data. Furthermore, our method uses variational inference for Dirichlet process mixtures, which ensures both efficient and rapid computation. Empirical analyses using benchmark datasets demonstrate that our method outperforms existing approaches in unsupervised outlier detection.
翻译:概率混合模型因其可解释性与全局特性而被公认为无监督离群点检测的有效工具。其中,狄利克雷过程混合模型在聚类与离群点检测任务中均展现出作为传统有限混合模型有力替代方案的潜力。与有限混合模型不同,狄利克雷过程混合模型属于无限混合模型,能够依据数据自动确定混合成分的数量。尽管具有这些优势,狄利克雷过程混合模型在无监督离群点检测中的应用仍受限于计算效率不足以及离群点检测器构建过程中对离群值敏感等问题。此外,狄利克雷过程高斯混合模型难以有效建模具有离散或二元特征的非高斯数据。为应对这些挑战,我们提出一种利用狄利克雷过程高斯混合模型集成的新型离群点检测方法。该无监督算法采用随机子空间与子采样集成策略,以确保计算效率并提升离群点检测器的鲁棒性。集成方法进一步增强了所提方法在检测非高斯数据中离群点的适用性。此外,本方法采用针对狄利克雷过程混合模型的变分推断技术,从而保障了高效快速的计算。基于基准数据集的实证分析表明,本方法在无监督离群点检测任务中优于现有方法。