In this paper, we present a novel characterization of the smoothness of a model based on basic principles of Large Deviation Theory. In contrast to prior work, where the smoothness of a model is normally characterized by a real value (e.g., the weights' norm), we show that smoothness can be described by a simple real-valued function. Based on this concept of smoothness, we propose an unifying theoretical explanation of why some interpolators generalize remarkably well and why a wide range of modern learning techniques (i.e., stochastic gradient descent, $\ell_2$-norm regularization, data augmentation, invariant architectures, and overparameterization) are able to find them. The emergent conclusion is that all these methods provide complimentary procedures that bias the optimizer to smoother interpolators, which, according to this theoretical analysis, are the ones with better generalization error.
翻译:本文基于大偏差理论的基本原理,提出一种新颖的模型平滑度刻画方法。先前研究通常采用实数值(如权重的范数)表征模型平滑度,而本文证明平滑度可通过简单的实值函数描述。基于这一平滑度概念,我们提出统一的理论解释,阐明为何某些内插方法具有卓越的泛化能力,以及随机梯度下降、$\ell_2$范数正则化、数据增强、不变性架构和过参数化等现代学习技术何以能发现此类内插器。最终结论表明:这些方法均通过互补机制引导优化器趋向更平滑的内插器——根据理论分析,这类内插器具有更优的泛化误差。