Penalized regression methods such as ridge regression heavily rely on the choice of a tuning or penalty parameter, which is often computed via cross-validation. Discrepancies in the value of the penalty parameter may lead to substantial differences in regression coefficient estimates and predictions. In this paper, we investigate the effect of single observations on the optimal choice of the tuning parameter, showing how the presence of influential points can change it dramatically. We distinguish between points as ``expanders'' and ``shrinkers'', based on their effect on the model complexity. Our approach supplies a visual exploratory tool to identify influential points, naturally implementable for high-dimensional data where traditional approaches usually fail. Applications to simulated and real data examples, both low- and high-dimensional, are presented. The visual tool is implemented in the R package influridge.
翻译:惩罚回归方法(如岭回归)严重依赖于通过交叉验证计算的调谐参数或惩罚参数的选择。惩罚参数值的差异可能导致回归系数估计和预测出现显著差异。本文研究了单个观测值对调谐参数最优选择的影响,展示了异常点如何显著改变这一选择。我们根据观测值对模型复杂性的影响,将其区分为"扩张点"和"收缩点"。该方法提供了一种可视化探索工具来识别异常点,并自然适用于传统方法难以处理的高维数据。我们通过低维和高维的模拟及真实数据示例进行了验证。该可视化工具已在R包influridge中实现。