Algorithm- and data-dependent generalization bounds are required to explain the generalization behavior of modern machine learning algorithms. In this context, there exists information theoretic generalization bounds that involve (various forms of) mutual information, as well as bounds based on hypothesis set stability. We propose a conceptually related, but technically distinct complexity measure to control generalization error, which is the empirical Rademacher complexity of an algorithm- and data-dependent hypothesis class. Combining standard properties of Rademacher complexity with the convenient structure of this class, we are able to (i) obtain novel bounds based on the finite fractal dimension, which (a) extend previous fractal dimension-type bounds from continuous to finite hypothesis classes, and (b) avoid a mutual information term that was required in prior work; (ii) we greatly simplify the proof of a recent dimension-independent generalization bound for stochastic gradient descent; and (iii) we easily recover results for VC classes and compression schemes, similar to approaches based on conditional mutual information.
翻译:现代机器学习算法的泛化行为需要依赖算法与数据相关的泛化界来解释。在此背景下,存在涉及(各种形式的)互信息的信息论泛化界,以及基于假设集稳定性的界。我们提出一个概念上相关但技术上不同的复杂度度量来控制泛化误差,即依赖于算法和数据的假设类别的经验Rademacher复杂度。结合Rademacher复杂度的标准性质与这类结构的便利性,我们能够:(i)基于有限分形维度获得新界限,这些界限(a)将先前的分形维数型界限从连续假设类别扩展到有限假设类别,(b)避免了先前工作中所需的互信息项;(ii)极大简化了近期关于随机梯度下降的维度无关泛化界的证明;(iii)轻松恢复VC类别和压缩方案的结果,类似于基于条件互信息的方法。