We propose a general framework for deriving generalization bounds for parallel positively homogeneous neural networks--a class of neural networks whose input-output map decomposes as the sum of positively homogeneous maps. Examples of such networks include matrix factorization and sensing, single-layer multi-head attention mechanisms, tensor factorization, deep linear and ReLU networks, and more. Our general framework is based on linking the non-convex empirical risk minimization (ERM) problem to a closely related convex optimization problem over prediction functions, which provides a global, achievable lower-bound to the ERM problem. We exploit this convex lower-bound to perform generalization analysis in the convex space while controlling the discrepancy between the convex model and its non-convex counterpart. We apply our general framework to a wide variety of models ranging from low-rank matrix sensing, to structured matrix sensing, two-layer linear networks, two-layer ReLU networks, and single-layer multi-head attention mechanisms, achieving generalization bounds with a sample complexity that scales almost linearly with the network width.
翻译:我们提出了一种通用框架,用于推导并行正齐次神经网络的泛化界——这类神经网络的输入输出映射可分解为正齐次映射之和。此类网络的示例包括矩阵分解与感知、单层多头注意力机制、张量分解、深度线性网络与ReLU网络等。我们的通用框架基于将非凸经验风险最小化问题与预测函数空间上密切相关的凸优化问题相联系,后者为ERM问题提供了一个全局可达的下界。我们利用该凸下界在凸空间中进行泛化分析,同时控制凸模型与其非凸对应物之间的差异。我们将该通用框架应用于从低秩矩阵感知、结构化矩阵感知、两层线性网络、两层ReLU网络到单层多头注意力机制等多种模型,获得了样本复杂度几乎与网络宽度呈线性关系的泛化界。