Understanding the generalization properties of heavy-tailed stochastic optimization algorithms has attracted increasing attention over the past years. While illuminating interesting aspects of stochastic optimizers by using heavy-tailed stochastic differential equations as proxies, prior works either provided expected generalization bounds, or introduced non-computable information theoretic terms. Addressing these drawbacks, in this work, we prove high-probability generalization bounds for heavy-tailed SDEs which do not contain any nontrivial information theoretic terms. To achieve this goal, we develop new proof techniques based on estimating the entropy flows associated with the so-called fractional Fokker-Planck equation (a partial differential equation that governs the evolution of the distribution of the corresponding heavy-tailed SDE). In addition to obtaining high-probability bounds, we show that our bounds have a better dependence on the dimension of parameters as compared to prior art. Our results further identify a phase transition phenomenon, which suggests that heavy tails can be either beneficial or harmful depending on the problem structure. We support our theory with experiments conducted in a variety of settings.
翻译:理解厚尾随机优化算法的泛化特性近年来受到越来越多的关注。以往工作虽然通过使用厚尾随机微分方程作为代理模型,揭示了随机优化器的某些有趣性质,但要么仅提供了期望泛化界,要么引入了不可计算的信息论项。为克服这些缺陷,本文证明了不含任何非平凡信息论项的厚尾随机微分方程的高概率泛化界。为实现这一目标,我们开发了新的证明技术,通过估计与所谓分数阶福克-普朗克方程(一种控制对应厚尾随机微分方程分布演化的偏微分方程)相关的熵流来推导结果。除获得高概率界外,我们证明所提界相较于现有文献对参数维度具有更好的依赖性。我们的结果进一步识别出相变现象,表明厚尾效应可能根据问题结构产生有利或有害的影响。我们在多种设置下开展的实验支持了上述理论。