Understanding the generalization behavior of deep neural networks remains a fundamental challenge in modern statistical learning theory. Among existing approaches, PAC-Bayesian norm-based bounds have demonstrated particular promise due to their data-dependent nature and their ability to capture algorithmic and geometric properties of learned models. However, most existing results rely on isotropic Gaussian posteriors, heavy use of spectral-norm concentration for weight perturbations, and largely architecture-agnostic analyses, which together limit both the tightness and practical relevance of the resulting bounds. To address these limitations, in this work, we propose a unified framework for PAC-Bayesian norm-based generalization by reformulating the derivation of generalization bounds as a stochastic optimization problem over anisotropic Gaussian posteriors. The key to our approach is a sensitivity matrix that quantifies the network outputs with respect to structured weight perturbations, enabling the explicit incorporation of heterogeneous parameter sensitivities and architectural structures. By imposing different structural assumptions on this sensitivity matrix, we derive a family of generalization bounds that recover several existing PAC-Bayesian results as special cases, while yielding bounds that are comparable to or tighter than state-of-the-art approaches. Such a unified framework provides a principled and flexible way for geometry-/structure-aware and interpretable generalization analysis in deep learning.
翻译:理解深度神经网络的泛化行为仍然是现代统计学习理论中的一个基本挑战。在现有方法中,基于范数的PAC-Bayesian界因其数据依赖的特性以及捕捉学习模型算法与几何性质的能力而展现出特别的潜力。然而,大多数现有结果依赖于各向同性高斯后验、大量使用权重扰动的谱范数集中性以及很大程度上与架构无关的分析,这些共同限制了所得界的紧致性和实际相关性。为解决这些局限性,本文通过将泛化界的推导重新表述为各向异性高斯后验上的随机优化问题,提出了一个统一的基于范数的PAC-Bayesian泛化分析框架。我们方法的关键在于一个灵敏度矩阵,该矩阵量化了网络输出相对于结构化权重扰动的敏感性,从而能够显式地纳入异质的参数敏感性和架构结构。通过对该灵敏度矩阵施加不同的结构假设,我们推导出一族泛化界,它们将多个现有的PAC-Bayesian结果作为特例包含在内,同时产生的界与最先进方法相当或更紧。这样一个统一的框架为深度学习中几何/结构感知且可解释的泛化分析提供了一种原则性且灵活的方法。