We study the generalization error of stochastic learning algorithms from an information-theoretic perspective, with a particular emphasis on deriving sharper bounds for differentially private algorithms. It is well known that the generalization error of stochastic learning algorithms can be bounded in terms of mutual information and maximal leakage, yielding in-expectation and high-probability guarantees, respectively. In this work, we further upper bound mutual information and maximal leakage by explicit, easily computable formulas, using typicality-based arguments and exploiting the stability properties of private algorithms. In the first part of the paper, we strictly improve the mutual-information bounds by Rodríguez-Gálvez et al. (IEEE Trans. Inf. Theory, 2021). In the second part, we derive new upper bounds on the maximal leakage of learning algorithms. In both cases, the resulting bounds on information measures translate directly into generalization error guarantees.
翻译:我们从信息论的角度研究随机学习算法的泛化误差,特别侧重于为差分隐私算法推导更严格的界。众所周知,随机学习算法的泛化误差可以用互信息和最大泄漏量来界定,分别给出期望意义和高概率意义的保证。在本工作中,我们利用基于典型性的论证方法,并结合隐私算法的稳定性特征,进一步通过显式且易于计算的公式对互信息和最大泄漏量进行上界估计。在论文的第一部分,我们严格改进了Rodríguez-Gálvez等人(IEEE Trans. Inf. Theory, 2021)提出的互信息界。在第二部分,我们推导了学习算法最大泄漏量的新上界。在这两种情况下,所得信息度量的界可直接转化为泛化误差的保证。