Cross-validation (CV) is known to provide asymptotically exact tests and confidence intervals for model improvement but only when the model comparison is relatively stable. Surprisingly, we prove that even simple, individually stable models can generate relatively unstable comparisons, calling into question the validity of CV inference. Specifically, we show that the Lasso and its close cousin, soft-thresholding, generate relatively unstable comparisons and invalid CV inferences, even in the most favorable of learning settings and when both models are individually stable. These findings highlight the importance of verifying relative stability before deploying CV for model comparison.
翻译:交叉验证(CV)已知能在模型比较相对稳定时提供渐近精确的检验和置信区间,以评估模型改进效果。然而,我们令人惊讶地证明,即使是简单且个体稳定的模型也可能产生相对不稳定的比较结果,这质疑了交叉验证推断的有效性。具体而言,我们表明Lasso及其近亲软阈值方法即使在最有利的学习环境下且两个模型个体稳定时,也会导致相对不稳定的比较和无效的交叉验证推断。这些发现强调了在运用交叉验证进行模型比较前验证相对稳定性的重要性。