This paper presents Two-Stage LKPLO, a novel multi-stage outlier detection framework that overcomes the coexisting limitations of conventional projection-based methods: their reliance on a fixed statistical metric and their assumption of a single data structure. Our framework uniquely synthesizes three key concepts: (1) a generalized loss-based outlyingness measure (PLO) that replaces the fixed metric with flexible, adaptive loss functions like our proposed SVM-like loss; (2) a global kernel PCA stage to linearize non-linear data structures; and (3) a subsequent local clustering stage to handle multi-modal distributions. Comprehensive 5-fold cross-validation experiments on 10 benchmark datasets, with automated hyperparameter optimization, demonstrate that Two-Stage LKPLO achieves state-of-the-art performance. It significantly outperforms strong baselines on datasets with challenging structures where existing methods fail, most notably on multi-cluster data (Optdigits) and complex, high-dimensional data (Arrhythmia). Furthermore, an ablation study empirically confirms that the synergistic combination of both the kernelization and localization stages is indispensable for its superior performance. This work contributes a powerful new tool for a significant class of outlier detection problems and underscores the importance of hybrid, multi-stage architectures.
翻译:本文提出了一种新颖的多阶段异常检测框架——两阶段LKPLO,该框架克服了传统基于投影的方法共存的两个局限性:对固定统计度量的依赖以及对单一数据结构的假设。我们的框架独特地融合了三个关键概念:(1)一种基于广义损失的离群度度量(PLO),它通过灵活的自适应损失函数(如我们提出的类SVM损失)替代固定度量;(2)一个全局核主成分分析阶段,用于线性化非线性数据结构;(3)一个后续的局部聚类阶段,以处理多模态分布。在10个基准数据集上进行的全面5折交叉验证实验(结合自动超参数优化)表明,两阶段LKPLO实现了最先进的性能。在现有方法失效的具有挑战性结构的数据集上,特别是在多簇数据(Optdigits)和复杂高维数据(Arrhythmia)上,它显著优于强基线方法。此外,消融实验从经验上证实,核化阶段与局部化阶段的协同组合对其卓越性能不可或缺。这项工作为一大类异常检测问题贡献了一种强大的新工具,并强调了混合多阶段架构的重要性。