We establish disintegrated PAC-Bayesian generalisation bounds for models trained with gradient descent methods or continuous gradient flows. Contrary to standard practice in the PAC-Bayesian setting, our result applies to optimisation algorithms that are deterministic, without requiring any de-randomisation step. Our bounds are fully computable, depending on the density of the initial distribution and the Hessian of the training objective over the trajectory. We show that our framework can be applied to a variety of iterative optimisation algorithms, including stochastic gradient descent (SGD), momentum-based schemes, and damped Hamiltonian dynamics.
翻译:我们为使用梯度下降方法或连续梯度流训练的模型建立了分解式PAC-贝叶斯泛化界。与PAC-贝叶斯框架中的标准实践相反,我们的结果适用于确定性优化算法,无需任何去随机化步骤。我们的界是完全可计算的,依赖于初始分布的密度以及训练目标函数沿轨迹的Hessian矩阵。我们证明了该框架可应用于多种迭代优化算法,包括随机梯度下降(SGD)、动量法方案以及阻尼哈密顿动力学。