In the present paper, we prove a new theorem, resulting in an update formula for linear regression model residuals calculating the exact k-fold cross-validation residuals for any choice of cross-validation strategy without model refitting. The required matrix inversions are limited by the cross-validation segment sizes and can be executed with high efficiency in parallel. The well-known formula for leave-one-out cross-validation follows as a special case of the theorem. In situations where the cross-validation segments consist of small groups of repeated measurements, we suggest a heuristic strategy for fast serial approximations of the cross-validated residuals and associated Predicted Residual Sum of Squares (PRESS) statistic. We also suggest strategies for efficient estimation of the minimum PRESS value and full PRESS function over a selected interval of regularisation values. The computational effectiveness of the parameter selection for Ridge- and Tikhonov regression modelling resulting from our theoretical findings and heuristic arguments is demonstrated in several applications with real and highly multivariate datasets.
翻译:本文证明了一个新定理,该定理给出了线性回归模型残差的更新公式,可在无需重新拟合模型的情况下,精确计算任意交叉验证策略下的K折交叉验证残差。所需的矩阵求逆运算受限于交叉验证分段大小,且可高效并行执行。著名的留一交叉验证公式是该定理的一个特例。当交叉验证分段由重复测量的小组数据组成时,我们提出了一种启发式策略,用于快速序列近似计算交叉验证残差及相关预测残差平方和统计量。我们还提出了在选定的正则化参数区间内,高效估计最小PRESS值及完整PRESS函数的策略。基于理论发现与启发式论证,我们在多个真实高维多元数据集应用中,展示了岭回归与Tikhonov回归建模参数选择方法的计算有效性。