Understanding the generalization abilities of modern machine learning algorithms has been a major research topic over the past decades. In recent years, the learning dynamics of Stochastic Gradient Descent (SGD) have been related to heavy-tailed dynamics. This has been successfully applied to generalization theory by exploiting the fractal properties of those dynamics. However, the derived bounds depend on mutual information (decoupling) terms that are beyond the reach of computability. In this work, we prove generalization bounds over the trajectory of a class of heavy-tailed dynamics, without those mutual information terms. Instead, we introduce a geometric decoupling term by comparing the learning dynamics (depending on the empirical risk) with an expected one (depending on the population risk). We further upper-bound this geometric term, by using techniques from the heavy-tailed and the fractal literature, making it fully computable. Moreover, as an attempt to tighten the bounds, we propose a PAC-Bayesian setting based on perturbed dynamics, in which the same geometric term plays a crucial role and can still be bounded using the techniques described above.
翻译:理解现代机器学习算法的泛化能力一直是过去几十年的主要研究课题。近年来,随机梯度下降(SGD)的学习动力学与重尾动力学相关联。通过利用这些动力学的分形特性,这一思路已成功应用于泛化理论。然而,推导出的界依赖于超出可计算性范围的互信息(解耦)项。在本工作中,我们证明了重尾动力学轨迹的泛化界,无需这些互信息项。取而代之的是,我们通过比较学习动力学(依赖于经验风险)与预期动力学(依赖于总体风险),引入了一个几何解耦项。我们进一步利用重尾和分形文献中的技术,对这项几何项给出了上界,使其完全可计算。此外,为了收紧界限,我们提出了一种基于扰动动力学的PAC-贝叶斯框架,其中相同的几何项发挥了关键作用,并且仍可使用上述技术进行有界分析。