An adaptive Cook's distance (ACD) for diagnosing influential observations in high-dimensional single-index models with multicollinearity and outlier contamination is proposed. ACD is a model-free technique built on sparse local linear gradients to temper leverage effects. In simulations spanning low- and high-dimensional design settings with strong correlation, ACD based on LASSO (ACD-LASSO) and SCAD (ACD-SCAD) penalties reduced masking and swamping relative to classical Cook's distance and local influence as well as the DF-Model and Case-Weight adjusted solution for LASSO. Trimming points flagged by ACD stabilizes variable selection while preserving core signals. Applications to two datasets--the 1960 US cities pollution study and a high-dimensional riboflavin genomics experiment show consistent gains in selection stability and interpretability.
翻译:本文提出了一种用于诊断高维单指标模型中具有多重共线性和异常值污染的影响观测点的自适应库克距离(ACD)。ACD是一种基于稀疏局部线性梯度的无模型技术,旨在缓和杠杆效应。在涵盖低维和高维设计设置且具有强相关性的模拟中,基于LASSO(ACD-LASSO)和SCAD(ACD-SCAD)惩罚项的ACD,相较于经典的库克距离和局部影响方法,以及LASSO的DF-Model和Case-Weight调整解,减少了掩蔽和淹没效应。剔除由ACD标记的点可以稳定变量选择,同时保留核心信号。对两个数据集——1960年美国城市污染研究和一个高维核黄素基因组学实验的应用表明,该方法在选择的稳定性和可解释性方面均取得了一致的提升。