We revisit the problem of ensuring strong test set performance via cross-validation, and propose a nested k-fold cross-validation scheme that selects hyperparameters by minimizing a weighted sum of the usual cross-validation metric and an empirical model-stability measure. The weight on the stability term is itself chosen via a nested cross-validation procedure. This reduces the risk of strong validation set performance and poor test set performance due to instability. We benchmark our procedure on a suite of $13$ real-world datasets, and find that, compared to $k$-fold cross-validation over the same hyperparameters, it improves the out-of-sample MSE for sparse ridge regression and CART by $4\%$ and $2\%$ respectively on average, but has no impact on XGBoost. It also reduces the user's out-of-sample disappointment, sometimes significantly. For instance, for sparse ridge regression, the nested k-fold cross-validation error is on average $0.9\%$ lower than the test set error, while the $k$-fold cross-validation error is $21.8\%$ lower than the test error. Thus, for unstable models such as sparse regression and CART, our approach improves test set performance and reduces out-of-sample disappointment.
翻译:本文重新审视了通过交叉验证确保强大测试集性能的问题,提出了一种嵌套k折交叉验证方案。该方案通过最小化常规交叉验证指标与经验模型稳定性度量的加权和来选择超参数,其中稳定性项的权重本身通过嵌套交叉验证程序确定。这种方法降低了因模型不稳定导致验证集表现优异而测试集表现不佳的风险。我们在13个真实世界数据集上对该方法进行了基准测试,发现与相同超参数下的k折交叉验证相比,该方法将稀疏岭回归和CART的样本外均方误差平均分别提升了4%和2%,但对XGBoost没有影响。该方法还降低了用户的样本外失望度,有时效果显著。例如对于稀疏岭回归,嵌套k折交叉验证误差平均比测试集误差低0.9%,而k折交叉验证误差比测试误差低21.8%。因此,对于稀疏回归和CART等不稳定模型,我们的方法既提升了测试集性能,又减少了样本外失望度。