We propose the data augmented bootstrap (DAB), a framework for constructing confidence intervals from approximately invariant transformations of the data. As special cases, DAB recovers popular methods that rely on exact group symmetries, such as conformal prediction, wild bootstrap for Maximum Mean Discrepancy U-statistics and the recently proposed SymmPI. Meanwhile, DAB also recovers the classical bootstrap method, which exploits the dataset's approximate invariance under uniform sampling of data indices as the dataset size grows. For all DAB methods, we establish theoretical coverage results that interpolate between finite-sample and asymptotic guarantees according to the strength of the invariance, and without assuming a group structure. The approximate invariance is measured in the Kolmogorov distance and, for statistics that satisfy Gaussian universality, reduces to conditional mean and variance matching. This allows us to incorporate data augmentation (DA), a widely used machine learning heuristic based on approximate invariances, into known statistical methods. We empirically test the performance of incorporating DA into bootstrap, wild bootstrap and conformal prediction for simulated settings as well as for image, language and scientific data.
翻译:我们提出数据增强自助法(DAB),这是一种利用数据的近似不变变换来构建置信区间的框架。作为特例,DAB能够还原依赖精确群对称性的流行方法,如共形预测、最大均值差异U统计量的野自助法以及近期提出的SymmPI方法。与此同时,DAB也还原了经典自助法——该方法利用数据集在均匀采样数据索引时随样本量增大而呈现的近似不变性。对所有DAB方法,我们在理论上建立了覆盖结果,这些结果根据不变性的强度在有限样本保证与渐近保证之间平滑插值,且无需假设群结构。近似不变性通过Kolmogorov距离度量,对于满足高斯普适性的统计量,可简化为条件均值与方差匹配。这使我们能将数据增强(DA)——这一基于近似不变性的广泛使用的机器学习启发式方法——融入已知统计方法。我们在模拟场景以及图像、语言和科学数据上,实证检验了将DA融入自助法、野自助法和共形预测的表现。