Topological Generalization Bounds for Discrete-Time Stochastic Optimization Algorithms

We present a novel set of rigorous and computationally efficient topology-based complexity notions that exhibit a strong correlation with the generalization gap in modern deep neural networks (DNNs). DNNs show remarkable generalization properties, yet the source of these capabilities remains elusive, defying the established statistical learning theory. Recent studies have revealed that properties of training trajectories can be indicative of generalization. Building on this insight, state-of-the-art methods have leveraged the topology of these trajectories, particularly their fractal dimension, to quantify generalization. Most existing works compute this quantity by assuming continuous- or infinite-time training dynamics, complicating the development of practical estimators capable of accurately predicting generalization without access to test data. In this paper, we respect the discrete-time nature of training trajectories and investigate the underlying topological quantities that can be amenable to topological data analysis tools. This leads to a new family of reliable topological complexity measures that provably bound the generalization error, eliminating the need for restrictive geometric assumptions. These measures are computationally friendly, enabling us to propose simple yet effective algorithms for computing generalization indices. Moreover, our flexible framework can be extended to different domains, tasks, and architectures. Our experimental results demonstrate that our new complexity measures correlate highly with generalization error in industry-standards architectures such as transformers and deep graph networks. Our approach consistently outperforms existing topological bounds across a wide range of datasets, models, and optimizers, highlighting the practical relevance and effectiveness of our complexity measures.

翻译：我们提出了一套新颖的、严格且计算高效的基于拓扑的复杂度概念，这些概念与现代深度神经网络（DNNs）的泛化差距表现出强相关性。深度神经网络展现出卓越的泛化特性，但这些能力的来源仍然难以捉摸，挑战了既有的统计学习理论。最近的研究表明，训练轨迹的特性可以指示泛化能力。基于这一见解，最先进的方法利用了这些轨迹的拓扑结构，特别是其分形维数，来量化泛化。现有工作大多通过假设连续时间或无限时间的训练动态来计算此量，这阻碍了能够在不访问测试数据的情况下准确预测泛化的实用估计器的开发。在本文中，我们尊重训练轨迹的离散时间本质，并研究那些能够适用于拓扑数据分析工具的底层拓扑量。这导出了一系列新的、可靠的拓扑复杂度度量，它们可证明地界定了泛化误差，从而消除了对限制性几何假设的需求。这些度量计算友好，使我们能够提出简单而有效的算法来计算泛化指标。此外，我们灵活的框架可以扩展到不同的领域、任务和架构。我们的实验结果表明，我们的新复杂度度量与Transformer和深度图网络等行业标准架构中的泛化误差高度相关。我们的方法在广泛的数据集、模型和优化器上持续优于现有的拓扑界，凸显了我们复杂度度量的实际相关性和有效性。