In this paper, we present a distribution-dependent PAC-Chernoff bound that is perfectly tight for interpolators even under overparametrized model classes. This bound relies on basic principles of Large Deviation Theory and naturally provides a characterization of the smoothness of a model described as a simple real-valued function. Based on this distribution-dependent bound and the novel definition of smoothness, we propose an unifying theoretical explanation of why some interpolators generalize remarkably well while others not. And why a wide range of modern learning techniques (i.e., $\ell_2$-norm, distance-from-initialization, input-gradient and variance regularization together with data augmentation, invariant architectures, and overparameterization) are able to find them. The emergent conclusion is that all these methods provide complimentary procedures that bias the optimizer to smoother interpolators, which, according to this theoretical analysis, are the ones with better generalization error. One of the main insights of this study is that distribution-dependent bounds serve as a powerful tool better understand the complex dynamics behind the generalization capabilities of highly-overparameterized interpolators.
翻译:本文提出了一种分布依赖的PAC-Chernoff界,该界对于插值器在过参数化模型类别下同样具有完美的紧致性。该界基于大偏差理论的基本原理,自然地刻画了由简单实值函数描述的模型的平滑性。基于这一分布依赖的界以及平滑性的新颖定义,我们提出了一种统一的理论解释:为何某些插值器具有卓越的泛化能力,而其他插值器则不然;为何广泛的现代学习技术(即$\ell_2$范数、初始化距离、输入梯度和方差正则化,以及数据增强、不变性架构和过参数化)能够找到这些插值器。最终结论表明,所有这些方法都提供了互补性策略,促使优化器偏向于更平滑的插值器,而根据本理论分析,这些插值器正是具有更优泛化误差的。本研究的主要见解之一是,分布依赖的界可作为有力工具,用于更深入地理解高度过参数化插值器泛化能力背后的复杂动态机制。