It is shown how to efficiently and accurately compute and optimize a range of cross validation criteria for a wide range of models estimated by minimizing a quadratically penalized smooth loss. Example models include generalized additive models for location scale and shape and smooth additive quantile regression. Example losses include negative log likelihoods and smooth quantile losses. Example cross validation criteria include leave-out-neighbourhood cross validation for dealing with un-modelled short range autocorrelation as well as the more familiar leave-one-out cross validation. For a $p$ coefficient model of $n$ data, estimable at $O(np^2)$ computational cost, the general $O(n^2p^2)$ cost of ordinary cross validation is reduced to $O(np^2)$, computing the cross validation criterion to $O(p^3n^{-2})$ accuracy. This is achieved by directly approximating the model coefficient estimates under data subset omission, via efficiently computed single step Newton updates of the full data coefficient estimates. Optimization of the resulting cross validation criterion, with respect to multiple smoothing/precision parameters, can be achieved efficiently using quasi-Newton optimization, adapted to deal with the indefiniteness that occurs when the optimal value for a smoothing parameter tends to infinity. The link between cross validation and the jackknife can be exploited to achieve reasonably well calibrated uncertainty quantification for the model coefficients in non standard settings such as leaving-out-neighbourhoods under residual autocorrelation or quantile regression. Several practical examples are provided, focussing particularly on dealing with un-modelled auto-correlation.
翻译:本文展示如何针对通过最小化二次惩罚平滑损失估计的广泛模型,高效且精确地计算和优化一系列交叉验证准则。示例模型包括用于位置尺度与形状的广义可加模型和平滑可加分位数回归。示例损失函数包括负对数似然和平滑分位数损失。示例交叉验证准则包括用于处理未建模短程自相关的留邻域交叉验证,以及更为常见的留一交叉验证。对于包含$n$个数据、$p$个系数且计算复杂度为$O(np^2)$的模型,普通交叉验证的$O(n^2p^2)$复杂度被降至$O(np^2)$,同时交叉验证准则的计算精度达到$O(p^3n^{-2})$。这一成果通过直接近似数据子集剔除下的模型系数估计实现,具体借助对全数据系数估计高效计算的单步牛顿更新。对于所得交叉验证准则关于多个平滑/精度参数的优化,可通过拟牛顿优化高效完成,并针对平滑参数最优值趋于无穷时出现的不定性进行适应性调整。交叉验证与刀切法之间的关联可被利用,在非标准设置(如残差自相关下的留邻域剔除或分位数回归)中实现模型系数合理校准的不确定性量化。本文提供了多个实用示例,尤其关注未建模自相关的处理。