The Minimum Covariance Determinant (MCD) method is a widely adopted tool for robust estimation and outlier detection. In this paper, we introduce a new framework for model selection in MCD with spectral embedding based on the notion of stability. Our best subset algorithm leverages principal component analysis for dimension reduction, statistical depths for effective initialization, and concentration steps for subset refinement. Subsequently, we construct a bootstrap procedure to estimate the instability of the best subset algorithm. The parameter combination exhibiting minimal instability proves ideal for the purposes of high-dimensional outlier detection, while the instability path offers insights into the inlier/outlier structure. We rigorously benchmark the proposed framework against existing MCD variants and illustrate its practical utility on two spectra data sets and a cancer genomics data set.
翻译:最小协方差行列式(MCD)方法是稳健估计与离群点检测中广泛采用的工具。本文引入了一种基于稳定性概念、结合谱嵌入的MCD模型选择新框架。我们的最优子集算法利用主成分分析进行降维,借助统计深度实现有效初始化,并通过浓度步骤完成子集优化。随后,我们构建了自助法程序来估计最优子集算法的不稳定性。实验表明,具有最小不稳定性的参数组合在高维离群点检测任务中表现最优,而不稳定性路径则能揭示内点/离群点的结构特征。我们通过严格基准测试将所提框架与现有MCD变体进行对比,并在两个光谱数据集和一个癌症基因组学数据集上验证了其实用价值。