Despite ongoing theoretical research on cross-validation (CV), many theoretical questions remain widely open. This motivates our investigation into how properties of algorithm-distribution pairs can affect the choice for the number of folds in $k$-fold CV. Our results consist of a novel decomposition of the mean-squared error of cross-validation for risk estimation, which explicitly captures the correlations of error estimates across overlapping folds and includes a novel algorithmic stability notion, squared loss stability, that is considerably weaker than the typically required hypothesis stability in other comparable works. Furthermore, we prove: 1. For any learning algorithm that minimizes empirical risk, the mean-squared error of the $k$-fold cross-validation estimator $\widehat{L}_{\mathrm{CV}}^{(k)}$ of the population risk $L_{D}$ satisfies the following minimax lower bound: \[ \min_{k \mid n} \max_{D} \mathbb{E}\left[\big(\widehat{L}_{\mathrm{CV}}^{(k)} - L_{D}\big)^{2}\right]=Ω\big(\sqrt{k^*}/n\big), \] where $n$ is the sample size, $k$ the number of folds, and $k^*$ denotes the number of folds attaining the minimax optimum. This shows that even under idealized conditions, for large values of $k$, CV cannot attain the optimum of order $1/n$ achievable by a validation set of size $n$, reflecting an inherent penalty caused by dependence between folds. 2. Complementing this, we exhibit learning rules for which \[ \max_{D}\mathbb{E}\!\left[\big(\widehat{L}_{\mathrm{CV}}^{(k)} - L_{D}\big)^{2}\right]=Ω(k/n), \] matching (up to constants) the accuracy of a hold-out estimator of a single fold of size $n/k$. Together these results delineate the fundamental trade-off in resampling-based risk estimation: CV cannot fully exploit all $n$ samples for unbiased risk evaluation, and its minimax performance is pinned between the $k/n$ and $\sqrt{k}/n$ regimes.
翻译:尽管关于交叉验证(CV)的理论研究持续进行,许多理论问题仍悬而未决。这促使我们探究算法-分布对的性质如何影响$k$折交叉验证中折数$k$的选择。我们的成果包括:提出了一种用于风险估计的交叉验证均方误差的新颖分解方法,该方法显式捕捉了重叠折间误差估计的相关性,并引入了一种新的算法稳定性概念——平方损失稳定性,该概念显著弱于其他同类工作中通常要求的假设稳定性。此外,我们证明了:1. 对于任何最小化经验风险的算法,总体风险$L_{D}$的$k$折交叉验证估计量$\widehat{L}_{\mathrm{CV}}^{(k)}$的均方误差满足以下极小极大下界:\[ \min_{k \mid n} \max_{D} \mathbb{E}\left[\big(\widehat{L}_{\mathrm{CV}}^{(k)} - L_{D}\big)^{2}\right]=Ω\big(\sqrt{k^*}/n\big), \] 其中$n$为样本量,$k$为折数,$k^*$表示达到极小极大最优的折数。这表明即使在理想条件下,对于较大的$k$值,交叉验证也无法达到大小为$n$的验证集所能实现的$1/n$阶最优,反映了折间依赖性导致的固有惩罚。2. 作为补充,我们构造了满足下式的学习规则:\[ \max_{D}\mathbb{E}\!\left[\big(\widehat{L}_{\mathrm{CV}}^{(k)} - L_{D}\big)^{2}\right]=Ω(k/n), \] 该结果在常数因子内匹配了大小为$n/k$的单一折的留出估计量的精度。这些结果共同界定了基于重采样的风险估计中的根本权衡:交叉验证无法充分利用全部$n$个样本进行无偏风险评估,其极小极大性能被限制在$k/n$与$\sqrt{k}/n$两种机制之间。