A Modern Theory of Cross-Validation through the Lens of Stability

Modern data analysis and statistical learning are marked by complex data structures and black-box algorithms. Data complexity stems from technologies such as imaging, remote sensing, wearable devices, and genomic sequencing. At the same time, black-box models, especially deep neural networks, have achieved impressive results. This combination raises new challenges for uncertainty quantification and statistical inference, which we refer to as ``black-box inference.'' Black-box inference is difficult due to the lack of traditional modeling assumptions and the opaque behavior of modern estimators. These factors make it hard to characterize the distribution of estimation errors. A popular solution is post-hoc randomization, which, under mild assumptions such as exchangeability, can yield valid uncertainty quantification. Such methods range from classical techniques like permutation tests, the jackknife, and the bootstrap to more recent innovations like conformal inference. These approaches typically require little knowledge of data distributions or the internal workings of estimators. Many rely on the idea that estimators behave similarly under small perturbations of the data -- a concept formalized as stability. Over time, stability has become a key principle in data science, influencing research on generalization error, privacy, and adaptive inference. This article investigates cross-validation (CV) -- a widely used resampling method -- through the lens of stability. We first review recent theoretical results on CV for estimating generalization error and model selection under stability assumptions. We then examine uncertainty quantification for CV-based risk estimates. Together, these insights yield new theory and tools, which we apply to topics including model selection, selective inference, and conformal prediction.

翻译：现代数据分析和统计学习以复杂的数据结构与黑箱算法为特征。数据复杂性源于成像、遥感、可穿戴设备和基因组测序等技术。与此同时，黑箱模型（尤其是深度神经网络）已取得显著成果。这种结合为不确定性量化和统计推断带来了新挑战，我们称之为“黑箱推断”。由于缺乏传统建模假设以及现代估计器的不透明行为，黑箱推断变得困难。这些因素使得估计误差的分布难以刻画。一种流行的解决方案是事后随机化方法，在可交换性等温和假设下，该方法能够产生有效的不确定性量化。此类方法涵盖从经典的置换检验、刀切法、自助法到较新的创新方法（如保形推断）等多种技术。这些方法通常对数据分布或估计器内部机制的知识需求较少，许多方法依赖于估计器在数据微小扰动下表现相似的原理——这一概念被形式化为稳定性。随着时间推移，稳定性已成为数据科学的核心原则，影响着泛化误差、隐私保护和自适应推断等领域的研究。本文通过稳定性视角研究交叉验证（CV）——一种广泛使用的重采样方法。我们首先回顾在稳定性假设下，CV用于估计泛化误差和模型选择的最新理论成果。随后探讨基于CV的风险估计的不确定性量化问题。综合这些见解，我们提出了新的理论和工具，并将其应用于模型选择、选择性推断和保形预测等主题。