With the rise of high-dimensional correlated data, multicollinearity poses a significant challenge to model stability, often leading to unstable estimation and reduced predictive accuracy. This work proposes the Single-Parametric Principal Component Selection Operator (SPPCSO), an innovative penalized estimation method that integrates single-parametric principal component regression and $L_{1}$ regularization to adaptively adjust the shrinkage factor by incorporating principal component information. This approach achieves a balance between variable selection and coefficient estimation, ensuring model stability and robust estimation even in high-dimensional, high-noise environments. The primary contribution lies in addressing the instability of traditional variable selection methods when applied to high-noise, high-dimensional correlated data. Theoretically, our method exhibits selection consistency and achieves a smaller estimation error bound compared to traditional penalized estimation approaches. Extensive numerical experiments demonstrate that SPPCSO not only delivers stable and reliable estimation in high-noise settings but also accurately distinguishes signal variables from noise variables in group-effect structured data with highly correlated noise variables, effectively eliminating redundant variables and achieving more stable variable selection. Furthermore, SPPCSO successfully identifies disease-associated genes in gene expression data analysis, showcasing strong practical value. The results indicate that SPPCSO serves as an ideal tool for high-dimensional variable selection, offering an efficient and interpretable solution for modeling correlated data.
翻译:随着高维相关数据的兴起,多重共线性对模型稳定性构成重大挑战,常导致估计不稳定和预测精度下降。本文提出单参数主成分选择算子(SPPCSO),这是一种创新的惩罚估计方法,它融合了单参数主成分回归与$L_{1}$正则化,通过引入主成分信息自适应调整收缩因子。该方法在变量选择与系数估计之间取得平衡,即使在高维、高噪声环境下也能确保模型稳定性和估计的稳健性。其主要贡献在于解决了传统变量选择方法应用于高噪声、高维相关数据时的不稳定性问题。理论上,与传统惩罚估计方法相比,我们的方法具有选择一致性,并获得了更小的估计误差界。大量数值实验表明,SPPCSO不仅能在高噪声环境下提供稳定可靠的估计,还能在噪声变量高度相关且具有群组效应结构的数据中准确区分信号变量与噪声变量,有效剔除冗余变量,实现更稳定的变量选择。此外,在基因表达数据分析中,SPPCSO成功识别了疾病相关基因,展现出强大的实用价值。结果表明,SPPCSO是高维变量选择的理想工具,为相关数据建模提供了高效且可解释的解决方案。