Stability Regularized Cross-Validation

from arxiv, Some of this material previously appeared in 2306.14851v2, which we have split into two papers (this one and 2306.14851v3), because it contained two ideas that need separate papers

We revisit the problem of ensuring strong test set performance via cross-validation, and propose a nested k-fold cross-validation scheme that selects hyperparameters by minimizing a weighted sum of the usual cross-validation metric and an empirical model-stability measure. The weight on the stability term is itself chosen via a nested cross-validation procedure. This reduces the risk of strong validation set performance and poor test set performance due to instability. We benchmark our procedure on a suite of $13$ real-world datasets, and find that, compared to $k$-fold cross-validation over the same hyperparameters, it improves the out-of-sample MSE for sparse ridge regression and CART by $4\%$ and $2\%$ respectively on average, but has no impact on XGBoost. It also reduces the user's out-of-sample disappointment, sometimes significantly. For instance, for sparse ridge regression, the nested k-fold cross-validation error is on average $0.9\%$ lower than the test set error, while the $k$-fold cross-validation error is $21.8\%$ lower than the test error. Thus, for unstable models such as sparse regression and CART, our approach improves test set performance and reduces out-of-sample disappointment.

翻译：本文重新审视了通过交叉验证确保强大测试集性能的问题，提出了一种嵌套k折交叉验证方案。该方案通过最小化常规交叉验证指标与经验模型稳定性度量的加权和来选择超参数，其中稳定性项的权重本身通过嵌套交叉验证程序确定。这种方法降低了因模型不稳定导致验证集表现优异而测试集表现不佳的风险。我们在13个真实世界数据集上对该方法进行了基准测试，发现与相同超参数下的k折交叉验证相比，该方法将稀疏岭回归和CART的样本外均方误差平均分别提升了4%和2%，但对XGBoost没有影响。该方法还降低了用户的样本外失望度，有时效果显著。例如对于稀疏岭回归，嵌套k折交叉验证误差平均比测试集误差低0.9%，而k折交叉验证误差比测试误差低21.8%。因此，对于稀疏回归和CART等不稳定模型，我们的方法既提升了测试集性能，又减少了样本外失望度。

相关内容

交叉验证

关注 2

交叉验证，有时也称为旋转估计或样本外测试，是用于评估统计结果如何的各种类似模型验证技术中的任何一种分析将概括为一个独立的数据集。它主要用于设置，其目的是预测，和一个想要估计如何准确地一个预测模型在实践中执行。在预测问题中，通常会给模型一个已知数据的数据集，在该数据集上进行训练（训练数据集）以及未知数据（或首次看到的数据）的数据集（根据该数据集测试模型）（称为验证数据集或测试集）。交叉验证的目标是测试模型预测未用于估计数据的新数据的能力，以发现诸如过度拟合或选择偏倚之类的问题，并提供有关如何进行建模的见解。该模型将推广到一个独立的数据集（例如，未知数据集，例如来自实际问题的数据集）。一轮交叉验证涉及分割一个样品的数据到互补的子集，在一个子集执行所述分析（称为训练集），以及验证在另一子集中的分析（称为验证集合或测试集）。为了减少可变性，在大多数方法中，使用不同的分区执行多轮交叉验证，并将验证结果组合（例如取平均值）在各轮中，以估计模型的预测性能。总而言之，交叉验证结合了预测中适用性的度量（平均），以得出模型预测性能的更准确估计。

推荐！《不确定性条件下的联合多域作战规划：自适应与模块化》最新174页博士论文

专知会员服务

50+阅读 · 2025年9月8日

【博士论文】生成模型中的可控性与不确定性，214页pdf

专知会员服务

47+阅读 · 2024年3月14日

牛津大学等《多智能体系统的博弈论验证》最新论文，Rational verification: game-theoretic verification of multi-agent systems

专知会员服务

43+阅读 · 2022年4月4日

【ACMMM2021】通用近似交叉验证的模型选择：监督、半监督与比对学习

专知会员服务

16+阅读 · 2021年10月10日