This paper introduces a distribution-dependent PAC-Chernoff bound that exhibits perfect tightness for interpolators, even within over-parameterized model classes. This bound, which relies on basic principles of Large Deviation Theory, defines a natural measure of the smoothness of a model, characterized by simple real-valued functions. Building upon this bound and the new concept of smoothness, we present an unified theoretical framework revealing why certain interpolators show an exceptional generalization, while others falter. We theoretically show how a wide spectrum of modern learning methodologies, encompassing techniques such as $\ell_2$-norm, distance-from-initialization and input-gradient regularization, in combination with data augmentation, invariant architectures, and over-parameterization, collectively guide the optimizer toward smoother interpolators, which, according to our theoretical framework, are the ones exhibiting superior generalization performance. This study shows that distribution-dependent bounds serve as a powerful tool to understand the complex dynamics behind the generalization capabilities of over-parameterized interpolators.
翻译:本文提出了一种分布依赖的 PAC-Chernoff 界,该界即使在过参数化的模型类中,对插值器也表现出完美的紧致性。该界基于大偏差理论的基本原理,定义了一种模型平滑性的自然度量,该度量由简单的实值函数刻画。基于此界及平滑性这一新概念,我们提出了一个统一的理论框架,揭示了为何某些插值器展现出卓越的泛化能力,而另一些则表现不佳。我们从理论上展示了广泛的现代学习方法——包括 $\ell_2$ 范数、初始化距离和输入梯度正则化等技术,并结合数据增强、不变架构和过参数化——如何共同引导优化器趋向更平滑的插值器;而根据我们的理论框架,正是这些插值器表现出更优越的泛化性能。本研究表明,分布依赖的界是理解过参数化插值器泛化能力背后复杂动态的有力工具。