Cross validation is widely used for selecting tuning parameters in regularization methods, but it is computationally intensive in general. To lessen its computational burden, approximation schemes such as generalized approximate cross validation (GACV) are often employed. However, such approximations may not work well when non-smooth loss functions are involved. As a case in point, approximate cross validation schemes for penalized quantile regression do not work well for extreme quantiles. In this paper, we propose a new algorithm to compute the leave-one-out cross validation scores exactly for quantile regression with ridge penalty through a case-weight adjusted solution path. Resorting to the homotopy technique in optimization, we introduce a case weight for each individual data point as a continuous embedding parameter and decrease the weight gradually from one to zero to link the estimators based on the full data and those with a case deleted. This allows us to design a solution path algorithm to compute all leave-one-out estimators very efficiently from the full-data solution. We show that the case-weight adjusted solution path is piecewise linear in the weight parameter, and using the solution path, we examine case influences comprehensively and observe that different modes of case influences emerge, depending on the specified quantiles, data dimensions and penalty parameter. We further illustrate the utility of the proposed algorithm in real-world applications.
翻译:交叉验证在正则化方法中广泛用于选择调优参数,但其计算通常较为密集。为减轻计算负担,常采用广义近似交叉验证(GACV)等近似方案。然而,当涉及非光滑损失函数时,此类近似方法可能效果不佳。以惩罚分位数回归为例,其近似交叉验证方案在极端分位数情况下表现欠佳。本文提出一种新算法,通过样本权重调整的解路径精确计算带岭惩罚的分位数回归的留一交叉验证得分。借助优化中的同伦技术,我们将每个数据点的样本权重作为连续嵌入参数引入,并将权重从1逐渐降至0,从而建立基于完整数据的估计量与删除单个样本后的估计量之间的联系。这使得我们能够设计一种解路径算法,从全数据解出发高效计算所有留一估计量。我们证明样本权重调整的解路径在权重参数上是分段线性的,并利用该解路径全面考察样本影响,发现根据指定的分位数、数据维度和惩罚参数的不同,会呈现多种样本影响模式。我们进一步通过实际应用案例说明了所提算法的实用性。