On Neighbourhood Cross Validation

It is shown how to efficiently and accurately compute and optimize a range of cross validation criteria for a wide range of models estimated by minimizing a quadratically penalized smooth loss. Example models include generalized additive models for location scale and shape and smooth additive quantile regression. Example losses include negative log likelihoods and smooth quantile losses. Example cross validation criteria include leave-out-neighbourhood cross validation for dealing with un-modelled short range autocorrelation as well as the more familiar leave-one-out cross validation. For a $p$ coefficient model of $n$ data, estimable at $O(np^2)$ computational cost, the general $O(n^2p^2)$ cost of ordinary cross validation is reduced to $O(np^2)$, computing the cross validation criterion to $O(p^3n^{-2})$ accuracy. This is achieved by directly approximating the model coefficient estimates under data subset omission, via efficiently computed single step Newton updates of the full data coefficient estimates. Optimization of the resulting cross validation criterion, with respect to multiple smoothing/precision parameters, can be achieved efficiently using quasi-Newton optimization, adapted to deal with the indefiniteness that occurs when the optimal value for a smoothing parameter tends to infinity. The link between cross validation and the jackknife can be exploited to achieve reasonably well calibrated uncertainty quantification for the model coefficients in non standard settings such as leaving-out-neighbourhoods under residual autocorrelation or quantile regression. Several practical examples are provided, focussing particularly on dealing with un-modelled auto-correlation.

翻译：本文展示如何针对通过最小化二次惩罚平滑损失估计的广泛模型，高效且精确地计算和优化一系列交叉验证准则。示例模型包括用于位置尺度与形状的广义可加模型和平滑可加分位数回归。示例损失函数包括负对数似然和平滑分位数损失。示例交叉验证准则包括用于处理未建模短程自相关的留邻域交叉验证，以及更为常见的留一交叉验证。对于包含$n$个数据、$p$个系数且计算复杂度为$O(np^2)$的模型，普通交叉验证的$O(n^2p^2)$复杂度被降至$O(np^2)$，同时交叉验证准则的计算精度达到$O(p^3n^{-2})$。这一成果通过直接近似数据子集剔除下的模型系数估计实现，具体借助对全数据系数估计高效计算的单步牛顿更新。对于所得交叉验证准则关于多个平滑/精度参数的优化，可通过拟牛顿优化高效完成，并针对平滑参数最优值趋于无穷时出现的不定性进行适应性调整。交叉验证与刀切法之间的关联可被利用，在非标准设置（如残差自相关下的留邻域剔除或分位数回归）中实现模型系数合理校准的不确定性量化。本文提供了多个实用示例，尤其关注未建模自相关的处理。

相关内容

交叉验证

关注 2

交叉验证，有时也称为旋转估计或样本外测试，是用于评估统计结果如何的各种类似模型验证技术中的任何一种分析将概括为一个独立的数据集。它主要用于设置，其目的是预测，和一个想要估计如何准确地一个预测模型在实践中执行。在预测问题中，通常会给模型一个已知数据的数据集，在该数据集上进行训练（训练数据集）以及未知数据（或首次看到的数据）的数据集（根据该数据集测试模型）（称为验证数据集或测试集）。交叉验证的目标是测试模型预测未用于估计数据的新数据的能力，以发现诸如过度拟合或选择偏倚之类的问题，并提供有关如何进行建模的见解。该模型将推广到一个独立的数据集（例如，未知数据集，例如来自实际问题的数据集）。一轮交叉验证涉及分割一个样品的数据到互补的子集，在一个子集执行所述分析（称为训练集），以及验证在另一子集中的分析（称为验证集合或测试集）。为了减少可变性，在大多数方法中，使用不同的分区执行多轮交叉验证，并将验证结果组合（例如取平均值）在各轮中，以估计模型的预测性能。总而言之，交叉验证结合了预测中适用性的度量（平均），以得出模型预测性能的更准确估计。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日