This paper presents Two-Stage LKPLO, a novel multi-stage outlier detection framework that overcomes the coexisting limitations of conventional projection-based methods: their reliance on a fixed statistical metric and their assumption of a single data structure. Our framework uniquely synthesizes three key concepts: (1) a generalized loss-based outlyingness measure (PLO) that replaces the fixed metric with flexible, adaptive loss functions like our proposed SVM-like loss; (2) a global kernel PCA stage to linearize non-linear data structures; and (3) a subsequent local clustering stage to handle multi-modal distributions. Comprehensive 5-fold cross-validation experiments on 10 benchmark datasets, with automated hyperparameter optimization, demonstrate that Two-Stage LKPLO achieves state-of-the-art performance. It significantly outperforms strong baselines on datasets with challenging structures where existing methods fail, most notably on multi-cluster data (Optdigits) and complex, high-dimensional data (Arrhythmia). Furthermore, an ablation study empirically confirms that the synergistic combination of both the kernelization and localization stages is indispensable for its superior performance. This work contributes a powerful new tool for a significant class of outlier detection problems and underscores the importance of hybrid, multi-stage architectures.
翻译:本文提出两阶段局部核投影离群度(Two-Stage LKPLO),这是一种新颖的多阶段离群检测框架,克服了传统基于投影方法的共存局限性:即依赖固定统计度量以及假设单一数据结构的缺陷。我们的框架独特地综合了三个核心概念:(1) 一种广义的基于损失的离群度量(PLO),用灵活的自适应损失函数(如我们提出的类SVM损失)替代固定度量;(2) 一个全局核主成分分析阶段,用于线性化非线性数据结构;(3) 一个后续的局部聚类阶段,用于处理多模态分布。在10个基准数据集上进行的全面五折交叉验证实验,结合自动超参数优化,表明两阶段LKPLO实现了最先进的性能。它显著优于在现有方法失效的具有挑战性结构的数据集上的强基线,尤其是在多簇数据(Optdigits)和复杂高维数据(Arrhythmia)上。此外,消融研究实证确认,核化与局部化阶段的协同组合对其卓越性能是必不可少的。这项工作为解决一类重要的离群检测问题提供了强有力的新工具,并强调了混合多阶段架构的重要性。