This paper introduces a distribution-dependent PAC-Chernoff bound that exhibits perfect tightness for interpolators, even within over-parameterized model classes. This bound, which relies on basic principles of Large Deviation Theory, defines a natural measure of the smoothness of a model, characterized by simple real-valued functions. Building upon this bound and the new concept of smoothness, we present an unified theoretical framework revealing why certain interpolators show an exceptional generalization, while others falter. We theoretically show how a wide spectrum of modern learning methodologies, encompassing techniques such as $\ell_2$-norm, distance-from-initialization and input-gradient regularization, in combination with data augmentation, invariant architectures, and over-parameterization, collectively guide the optimizer toward smoother interpolators, which, according to our theoretical framework, are the ones exhibiting superior generalization performance. This study shows that distribution-dependent bounds serve as a powerful tool to understand the complex dynamics behind the generalization capabilities of over-parameterized interpolators.
翻译:本文提出了一种分布依赖的PAC-Chernoff界,该界对于插值器(即使在过参数化模型类别中)表现出完美的紧致性。该界基于大偏差理论的基本原理,定义了一种由简单实值函数表征的模型光滑性自然度量。基于该界与光滑性这一新概念,我们构建了一个统一的理论框架,揭示了为何某些插值器展现出非凡的泛化能力,而另一些则表现不佳。我们从理论上证明,现代学习方法的广泛谱系(包括诸如ℓ₂范数、初始距离正则化、输入梯度正则化等技术,并结合数据增强、不变架构以及过参数化)共同引导优化器趋向于更光滑的插值器——而根据我们的理论框架,这些插值器正是展现出更优泛化性能的模型。本研究表明,分布依赖的界是理解过参数化插值器泛化能力背后复杂动力学机制的有力工具。