We adopt an information-theoretic framework to analyze the generalization behavior of the class of iterative, noisy learning algorithms. This class is particularly suitable for study under information-theoretic metrics as the algorithms are inherently randomized, and it includes commonly used algorithms such as Stochastic Gradient Langevin Dynamics (SGLD). Herein, we use the maximal leakage (equivalently, the Sibson mutual information of order infinity) metric, as it is simple to analyze, and it implies both bounds on the probability of having a large generalization error and on its expected value. We show that, if the update function (e.g., gradient) is bounded in $L_2$-norm and the additive noise is isotropic Gaussian noise, then one can obtain an upper-bound on maximal leakage in semi-closed form. Furthermore, we demonstrate how the assumptions on the update function affect the optimal (in the sense of minimizing the induced maximal leakage) choice of the noise. Finally, we compute explicit tight upper bounds on the induced maximal leakage for other scenarios of interest.
翻译:我们采用信息论框架分析迭代式含噪学习算法类别的泛化行为。由于此类算法本质上是随机化的,特别适合用信息论度量进行研究,其中包含随机梯度Langevin动力学(SGLD)等常用算法。本文使用最大泄露(等价于Sibson互信息无穷阶)度量,该度量既易于分析,又能同时给出大泛化误差概率及其期望值的界。我们证明:若更新函数(如梯度)具有$L_2$范数有界性且加性噪声为各向同性高斯噪声,则可获得半封闭形式的最大泄露上界。进一步,我们揭示更新函数假设如何影响噪声的最优选择(以诱导最小化最大泄露为准则)。最后,我们针对其他感兴趣场景计算了诱导最大泄露的显式紧上界。