This paper presents Two-Stage LKPLO, a novel multi-stage outlier detection framework that overcomes the coexisting limitations of conventional projection-based methods: their reliance on a fixed statistical metric and their assumption of a single data structure. Our framework uniquely synthesizes three key concepts: (1) a generalized loss-based outlyingness measure (PLO) that replaces the fixed metric with flexible, adaptive loss functions like our proposed SVM-like loss; (2) a global kernel PCA stage to linearize non-linear data structures; and (3) a subsequent local clustering stage to handle multi-modal distributions. Comprehensive 5-fold cross-validation experiments on 10 benchmark datasets, with automated hyperparameter optimization, demonstrate that Two-Stage LKPLO achieves state-of-the-art performance. It significantly outperforms strong baselines on datasets with challenging structures where existing methods fail, most notably on multi-cluster data (Optdigits) and complex, high-dimensional data (Arrhythmia). Furthermore, an ablation study empirically confirms that the synergistic combination of both the kernelization and localization stages is indispensable for its superior performance. This work contributes a powerful new tool for a significant class of outlier detection problems and underscores the importance of hybrid, multi-stage architectures.
翻译:本文提出Two-Stage LKPLO,一种新颖的多阶段离群检测框架,克服了传统基于投影方法的两大共存局限:对固定统计度量的依赖以及对单一数据结构的假设。本框架独特地融合了三个关键概念:(1) 一种广义的基于损失的离群度度量(PLO),通过灵活的自适应损失函数(如我们提出的类SVM损失)替代固定度量;(2) 全局核PCA阶段以线性化非线性数据结构;(3) 后续的局部聚类阶段以处理多模态分布。在10个基准数据集上进行的全面5折交叉验证实验(配合自动化超参数优化)表明,Two-Stage LKPLO实现了最先进的性能。在现有方法失效的具有挑战性结构的数据集上,尤其是多簇数据(Optdigits)和复杂高维数据(Arrhythmia),其表现显著优于强基线方法。此外,消融实验从实证上证实,核化与局部化两阶段的协同组合对其卓越性能不可或缺。本研究为一大类离群检测问题贡献了一种强大的新工具,并强调了混合多阶段架构的重要性。