We study the generalization error of stochastic learning algorithms from an information-theoretic perspective, with a particular emphasis on deriving sharper bounds for differentially private algorithms. It is well known that the generalization error of stochastic learning algorithms can be bounded in terms of mutual information and maximal leakage, yielding in-expectation and high-probability guarantees, respectively. In this work, we further upper bound mutual information and maximal leakage by explicit, easily computable formulas, using typicality-based arguments and exploiting the stability properties of private algorithms. In the first part of the paper, we strictly improve the mutual-information bounds by Rodríguez-Gálvez et al. (IEEE Trans. Inf. Theory, 2021). In the second part, we derive new upper bounds on the maximal leakage of learning algorithms. In both cases, the resulting bounds on information measures translate directly into generalization error guarantees.
翻译:本文从信息论视角研究随机学习算法的泛化误差,重点在于推导差分隐私算法更紧致的误差界。众所周知,随机学习算法的泛化误差可通过互信息和最大泄漏量进行界定,分别对应期望保证和高概率保证。本研究通过典型性论证并利用隐私算法的稳定性特征,进一步使用显式且易于计算的公式对互信息和最大泄漏量进行上界估计。在第一部分中,我们严格改进了Rodríguez-Gálvez等人(IEEE Trans. Inf. Theory, 2021)提出的互信息界。第二部分推导了学习算法最大泄漏量的新上界。两种情形下,信息度量的最终界均可直接转化为泛化误差的理论保证。