一种诚实的预测性能交叉验证估计器 (A Honest Cross-Validation Estimator for Prediction Performance)

Cross-validation is a standard tool for obtaining a honest assessment of the performance of a prediction model. The commonly used version repeatedly splits data, trains the prediction model on the training set, evaluates the model performance on the test set, and averages the model performance across different data splits. A well-known criticism is that such cross-validation procedure does not directly estimate the performance of the particular model recommended for future use. In this paper, we propose a new method to estimate the performance of a model trained on a specific (random) training set. A naive estimator can be obtained by applying the model to a disjoint testing set. Surprisingly, cross-validation estimators computed from other random splits can be used to improve this naive estimator within a random-effects model framework. We develop two estimators -- a hierarchical Bayesian estimator and an empirical Bayes estimator -- that perform similarly to or better than both the conventional cross-validation estimator and the naive single-split estimator. Simulations and a real-data example demonstrate the superior performance of the proposed method.

翻译：交叉验证是获取预测模型性能诚实评估的标准工具。常用版本反复划分数据，在训练集上训练预测模型，在测试集上评估模型性能，并对不同数据划分下的模型性能取平均值。一个众所周知的批评是，这种交叉验证程序不能直接估计推荐用于未来使用的特定模型的性能。本文提出一种新方法来估计在特定（随机）训练集上训练的模型性能。通过将模型应用于不相交的测试集可获得朴素估计器。令人惊讶的是，在其他随机划分下计算的交叉验证估计器可在随机效应模型框架内用于改进该朴素估计器。我们开发了两种估计器——分层贝叶斯估计器与经验贝叶斯估计器——其性能与传统交叉验证估计器及朴素单次划分估计器相当或更优。仿真实验与真实数据案例证明了所提方法的优越性能。