The paper provides data-dependent bounds on the test error of the Gibbs algorithm in the overparameterized interpolation regime, where low training errors are also obtained for impossible data, such as random labels in classification. The bounds are stable under approximation with Langevin Monte Carlo algorithms. Experiments on the MNIST and CIFAR-10 datasets verify that the bounds yield nontrivial predictions on true labeled data and correctly upper bound the test error for random labels. Our method indicates that generalization in the low-temperature, interpolation regime is already signaled by small training errors in the more classical high temperature regime.
翻译:本文针对过参数化插值区域中的吉布斯算法,给出了其测试误差的数据依赖性界。在该区域中,即使对于不可能数据(如分类任务中的随机标签)也能获得较低的训练误差。这些界在通过朗之万蒙特卡洛算法进行近似时保持稳定。在MNIST和CIFAR-10数据集上的实验验证表明:这些界能在真实标注数据上产生非平凡的预测,并能正确给出随机标签情形下测试误差的上界。我们的方法指出,低温插值区域的泛化特性已可通过更经典的高温区域中的较小训练误差得到预示。