We adopt an information-theoretic framework to analyze the generalization behavior of the class of iterative, noisy learning algorithms. This class is particularly suitable for study under information-theoretic metrics as the algorithms are inherently randomized, and it includes commonly used algorithms such as Stochastic Gradient Langevin Dynamics (SGLD). Herein, we use the maximal leakage (equivalently, the Sibson mutual information of order infinity) metric, as it is simple to analyze, and it implies both bounds on the probability of having a large generalization error and on its expected value. We show that, if the update function (e.g., gradient) is bounded in $L_2$-norm, then adding isotropic Gaussian noise leads to optimal generalization bounds: indeed, the input and output of the learning algorithm in this case are asymptotically statistically independent. Furthermore, we demonstrate how the assumptions on the update function affect the optimal (in the sense of minimizing the induced maximal leakage) choice of the noise. Finally, we compute explicit tight upper bounds on the induced maximal leakage for several scenarios of interest.
翻译:我们采用信息论框架分析迭代型含噪学习算法这一类的泛化行为。由于这类算法本质上是随机化的,特别适合在信息论指标下进行研究,其涵盖随机梯度朗之万动力学(SGLD)等常用算法。本文采用最大泄露(等价于西布森互信息的无穷阶)指标,因其易于分析且能同时推导出泛化误差概率界及其期望值界。我们证明:若更新函数(如梯度)在$L_2$范数下有界,则添加各向同性高斯噪声可得到最优泛化界——此时学习算法的输入与输出渐近统计独立。进一步,我们揭示了更新函数的假设条件如何影响噪声的最优选择(以最小化诱导最大泄露为准则)。最后,针对若干典型场景,我们给出了诱导最大泄露的显式紧上界。