Leave-one-out (LOO) prediction provides a principled, data-dependent measure of generalization, yet guarantees in fully transductive settings remain poorly understood beyond specialized models. We introduce Median of Level-Set Aggregation (MLSA), a general aggregation procedure based on empirical-risk level sets around the ERM. For arbitrary fixed datasets and losses satisfying a mild monotonicity condition, we establish a multiplicative oracle inequality for the LOO error of the form \[ LOO_S(\hat{h}) \;\le\; C \cdot \frac{1}{n} \min_{h\in H} L_S(h) \;+\; \frac{Comp(S,H,\ell)}{n}, \qquad C>1. \] The analysis is based on a local level-set growth condition controlling how the set of near-optimal empirical-risk minimizers expands as the tolerance increases. We verify this condition in several canonical settings. For classification with VC classes under the 0-1 loss, the resulting complexity scales as $O(d \log n)$, where $d$ is the VC dimension. For finite hypothesis and density classes under bounded or log loss, it scales as $O(\log |H|)$ and $O(\log |P|)$, respectively. For logistic regression with bounded covariates and parameters, a volumetric argument based on the empirical covariance matrix yields complexity scaling as $O(d \log n)$ up to problem-dependent factors.
翻译:留一法(LOO)预测提供了一种原则性的、数据依赖的泛化度量方法,然而在完全转导式设定下,除特殊模型外,其理论保证仍缺乏深入理解。本文提出水平集聚合中位数(MLSA)方法,这是一种基于经验风险最小化(ERM)周围水平集的通用聚合过程。对于任意固定数据集及满足温和单调性条件的损失函数,我们建立了留一法误差的乘性预言机不等式,其形式为 \[ LOO_S(\hat{h}) \;\le\; C \cdot \frac{1}{n} \min_{h\in H} L_S(h) \;+\; \frac{Comp(S,H,\ell)}{n}, \qquad C>1. \] 该分析基于局部水平集增长条件,该条件控制着当容差增加时,接近最优的经验风险最小化假设集的扩展方式。我们在若干经典设定中验证了这一条件。对于使用VC类进行0-1损失分类的情况,所得复杂度标度为 $O(d \log n)$,其中 $d$ 为VC维。对于有界损失或对数损失下的有限假设类和密度类,复杂度分别标度为 $O(\log |H|)$ 和 $O(\log |P|)$。对于具有有界协变量和参数的逻辑回归,基于经验协方差矩阵的体积论证得出复杂度标度为 $O(d \log n)$,并包含问题相关因子。