Providing generalization guarantees for modern neural networks has been a crucial task in statistical learning. Recently, several studies have attempted to analyze the generalization error in such settings by using tools from fractal geometry. While these works have successfully introduced new mathematical tools to apprehend generalization, they heavily rely on a Lipschitz continuity assumption, which in general does not hold for neural networks and might make the bounds vacuous. In this work, we address this issue and prove fractal geometry-based generalization bounds without requiring any Lipschitz assumption. To achieve this goal, we build up on a classical covering argument in learning theory and introduce a data-dependent fractal dimension. Despite introducing a significant amount of technical complications, this new notion lets us control the generalization error (over either fixed or random hypothesis spaces) along with certain mutual information (MI) terms. To provide a clearer interpretation to the newly introduced MI terms, as a next step, we introduce a notion of "geometric stability" and link our bounds to the prior art. Finally, we make a rigorous connection between the proposed data-dependent dimension and topological data analysis tools, which then enables us to compute the dimension in a numerically efficient way. We support our theory with experiments conducted on various settings.
翻译:为现代神经网络提供泛化保证一直是统计学习中的关键任务。近期,多项研究尝试利用分形几何工具分析此类场景下的泛化误差。虽然这些工作成功引入了新的数学工具来理解泛化问题,但它们严重依赖于利普希茨连续性假设,该假设通常不适用于神经网络,并可能使界限失效。本研究解决了这一问题,在无需任何利普希茨假设的前提下证明了基于分形几何的泛化界。为此,我们基于学习理论中的经典覆盖论证,引入了一种数据依赖的分形维数。尽管这一新概念带来了大量技术复杂性,但它使我们能够控制(关于固定或随机假设空间的)泛化误差及某些互信息项。为更清晰地解释新引入的互信息项,我们进一步提出了"几何稳定性"概念,并将我们的界限与现有工作相关联。最后,我们在所提出的数据依赖维数与拓扑数据分析工具之间建立了严格联系,从而能够以数值高效的方式计算该维数。我们通过多种场景下的实验支持了理论结果。