Generalization and Stability of Interpolating Neural Networks with Minimal Width - 专知论文

会员服务 ·

0

泛化 · Omega · 训练误差 · 参数化 · 神经元 ·

2023 年 3 月 27 日

Generalization and Stability of Interpolating Neural Networks with Minimal Width

翻译：最小宽度插值神经网络的泛化性与稳定性

Hossein Taheri,Christos Thrampoulidis

from arxiv, With significant changes: Stating results without homogeneity assumption, Discussing results under NTK-separability in Section 4

We investigate the generalization and optimization properties of shallow neural-network classifiers trained by gradient descent in the interpolating regime. Specifically, in a realizable scenario where model weights can achieve arbitrarily small training error $\epsilon$ and their distance from initialization is $g(\epsilon)$, we demonstrate that gradient descent with $n$ training data achieves training error $O(g(1/T)^2 /T)$ and generalization error $O(g(1/T)^2 /n)$ at iteration $T$, provided there are at least $m=\Omega(g(1/T)^4)$ hidden neurons. We then show that our realizable setting encompasses a special case where data are separable by the model's neural tangent kernel. For this and logistic-loss minimization, we prove the training loss decays at a rate of $\tilde O(1/ T)$ given polylogarithmic number of neurons $m=\Omega(\log^4 (T))$. Moreover, with $m=\Omega(\log^{4} (n))$ neurons and $T\approx n$ iterations, we bound the test loss by $\tilde{O}(1/n)$. Our results differ from existing generalization outcomes using the algorithmic-stability framework, which necessitate polynomial width and yield suboptimal generalization rates. Central to our analysis is the use of a new self-bounded weak-convexity property, which leads to a generalized local quasi-convexity property for sufficiently parameterized neural-network classifiers. Eventually, despite the objective's non-convexity, this leads to convergence and generalization-gap bounds that resemble those found in the convex setting of linear logistic regression.

翻译：我们研究在插值机制下，通过梯度下降训练的浅层神经网络分类器的泛化与优化性质。具体而言，在可实现场景中，当模型权重能够达到任意小的训练误差$\epsilon$且其与初始化的距离为$g(\epsilon)$时，我们证明：若隐藏神经元数量至少为$m=\Omega(g(1/T)^4)$，则在迭代次数$T$下，使用$n$个训练数据的梯度下降方法可实现训练误差$O(g(1/T)^2 /T)$和泛化误差$O(g(1/T)^2 /n)$。随后我们证明，该可实现场景包含一种特殊情况：数据可通过模型的神经切向核分离。针对此情形及逻辑损失最小化问题，我们证明在多项式对数级别的神经元数量$m=\Omega(\log^4 (T))$下，训练损失以$\tilde O(1/ T)$的速率衰减。进一步地，当$m=\Omega(\log^{4} (n))$个神经元且$T\approx n$次迭代时，我们将测试损失上界约束为$\tilde{O}(1/n)$。我们的结果与现有基于算法稳定性框架的泛化结论不同，后者需要多项式宽度且仅能获得次优的泛化率。分析的核心在于使用一种新的自界弱凸性性质，该性质为充分参数化的神经网络分类器导出了广义局部拟凸性。最终，尽管目标函数非凸，该方法仍能得到类似于线性逻辑回归凸设定下的收敛性和泛化差距界限。

0

相关内容

Google 发布82页《深度学习泛化性揭秘》综述论文，On the Generalization Mystery in Deep Learning

Google 发布82页《深度学习泛化性揭秘》综述论文，On the Generalization Mystery in Deep Learning

专知会员服务

61+阅读 · 2022年3月22日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

专知会员服务

35+阅读 · 2020年4月15日

【上海交通大学-张拳石】可解释CNN，Interpretable CNNs for Object Classification

【上海交通大学-张拳石】可解释CNN，Interpretable CNNs for Object Classification

专知会员服务

46+阅读 · 2020年3月13日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

【MIT】时间序列GAN，Subadditivity of Probability Divergences

专知会员服务

63+阅读 · 2020年3月4日

【论文】深度卷积神经网络的ImageNet分类（ImageNet Classification with Deep Convolutional Neural Networks）

【论文】深度卷积神经网络的ImageNet分类（ImageNet Classification with Deep Convolutional Neural Networks）

专知会员服务

14+阅读 · 2020年1月1日

【图机器学习论文】图神经网络的逻辑表达性（Logical Expressiveness of Graph Neural Networks）

【图机器学习论文】图神经网络的逻辑表达性（Logical Expressiveness of Graph Neural Networks）

专知会员服务

41+阅读 · 2019年12月30日

【NeurlPS2019论文总结】一致收敛可能无法解释深度学习中的泛化现象，Uniform convergence may be unable to explain generalization in deep learning

【NeurlPS2019论文总结】一致收敛可能无法解释深度学习中的泛化现象，Uniform convergence may be unable to explain generalization in deep learning

专知会员服务

15+阅读 · 2019年12月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

神经网络高斯过程 (Neural Network Gaussian Process)

神经网络高斯过程 (Neural Network Gaussian Process)

PaperWeekly

0+阅读 · 2022年11月8日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

详解GAN的谱归一化（Spectral Normalization）

详解GAN的谱归一化（Spectral Normalization）

PaperWeekly

11+阅读 · 2019年2月13日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【CNN】一文读懂卷积神经网络CNN

【CNN】一文读懂卷积神经网络CNN

产业智能官

18+阅读 · 2018年1月2日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

微孢子虫感染诱导的家蚕细胞凋亡抑制及其Serpins在抑制过程中的作用

国家自然科学基金

0+阅读 · 2014年12月31日

有限域上多项式的p-进与T-进指数和

国家自然科学基金

0+阅读 · 2013年12月31日

神经元凋亡时GSK-3/Egr-1上调PUMA的作用及其机制

国家自然科学基金

0+阅读 · 2013年12月31日

IL-32/Integrins/FAK通路在肝纤维化形成中的作用研究

国家自然科学基金

0+阅读 · 2013年12月31日

微尺度流动沸腾的不稳定性形成机理与非稳态液膜相变动力学

国家自然科学基金

0+阅读 · 2012年12月31日

随机微分方程概周期解和遍历解

国家自然科学基金

4+阅读 · 2011年12月31日

近空间可变气动外形飞行器鲁棒自适应一体化协调控制

国家自然科学基金

0+阅读 · 2009年12月31日

神经元凋亡时Egr1对BH3-only蛋白Bim的转录调控

国家自然科学基金

0+阅读 · 2009年12月31日

基于动力学分析的Internet网络拥塞控制研究

国家自然科学基金

0+阅读 · 2009年12月31日

组合导航系统中基于混沌、小波和神经网络的信息融合方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

Generalized Neural Closure Models with Interpretability

Arxiv

0+阅读 · 2023年5月18日

A Measure of the Complexity of Neural Representations based on Partial Information Decomposition

Arxiv

0+阅读 · 2023年5月17日

Provably Correct Physics-Informed Neural Networks

Arxiv

0+阅读 · 2023年5月17日

A Note on Dimensionality Reduction in Deep Neural Networks using Empirical Interpolation Method

Arxiv

0+阅读 · 2023年5月16日

Content-Adaptive Downsampling in Convolutional Neural Networks

Arxiv

0+阅读 · 2023年5月16日

Output-sensitive Computation of Generalized Persistence Diagrams for 2-filtrations

Arxiv

0+阅读 · 2023年5月16日

Causal Analysis for Robust Interpretability of Neural Networks

Arxiv

0+阅读 · 2023年5月15日

Multi-task Learning of Order-Consistent Causal Graphs

Arxiv

10+阅读 · 2021年11月3日

Interpreting and Unifying Graph Neural Networks with An Optimization Framework

Arxiv

18+阅读 · 2021年1月28日

Learning with Interpretable Structure from RNN

Arxiv

20+阅读 · 2018年10月25日

VIP会员

文章信息

相关主题

最新内容

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

专知会员服务

0+阅读 · 今天14:40

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

专知会员服务

0+阅读 · 今天14:36

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

专知会员服务

7+阅读 · 今天2:06

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

专知会员服务

5+阅读 · 今天1:37

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

专知会员服务

3+阅读 · 6月17日

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

专知会员服务

5+阅读 · 6月17日

学习数据的几何：形状空间分析数学综述

学习数据的几何：形状空间分析数学综述

专知会员服务

4+阅读 · 6月17日

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

专知会员服务

7+阅读 · 6月17日

定向能反无人机系统最新发展动态

定向能反无人机系统最新发展动态

专知会员服务

7+阅读 · 6月17日

从燃煤战舰到算法战争：水面指挥的永恒要求

从燃煤战舰到算法战争：水面指挥的永恒要求

专知会员服务

4+阅读 · 6月17日

《短程弹道再入飞行器拦截时间中的一项异常现象》

《短程弹道再入飞行器拦截时间中的一项异常现象》

专知会员服务

6+阅读 · 6月17日

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

专知会员服务

6+阅读 · 6月17日

美智库《战术级指挥控制的迫切要求：构建弹性机动式指挥控制网络》报告

美智库《战术级指挥控制的迫切要求：构建弹性机动式指挥控制网络》报告

专知会员服务

5+阅读 · 6月17日

《韩国国防政策与军备出口：韩国安全与国防政策如何塑造其国防工业与军备出口格局》最新100页报告

《韩国国防政策与军备出口：韩国安全与国防政策如何塑造其国防工业与军备出口格局》最新100页报告

专知会员服务

4+阅读 · 6月17日

ICML 2026 | VOTP：用视频基础模型与最优传输，让离线偏好强化学习只需少量反馈

ICML 2026 | VOTP：用视频基础模型与最优传输，让离线偏好强化学习只需少量反馈

专知会员服务

6+阅读 · 6月16日

相关VIP内容

Google 发布82页《深度学习泛化性揭秘》综述论文，On the Generalization Mystery in Deep Learning

Google 发布82页《深度学习泛化性揭秘》综述论文，On the Generalization Mystery in Deep Learning

专知会员服务

61+阅读 · 2022年3月22日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

专知会员服务

35+阅读 · 2020年4月15日

【上海交通大学-张拳石】可解释CNN，Interpretable CNNs for Object Classification

【上海交通大学-张拳石】可解释CNN，Interpretable CNNs for Object Classification

专知会员服务

46+阅读 · 2020年3月13日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

【MIT】时间序列GAN，Subadditivity of Probability Divergences

专知会员服务

63+阅读 · 2020年3月4日

【论文】深度卷积神经网络的ImageNet分类（ImageNet Classification with Deep Convolutional Neural Networks）

【论文】深度卷积神经网络的ImageNet分类（ImageNet Classification with Deep Convolutional Neural Networks）

专知会员服务

14+阅读 · 2020年1月1日

【图机器学习论文】图神经网络的逻辑表达性（Logical Expressiveness of Graph Neural Networks）

【图机器学习论文】图神经网络的逻辑表达性（Logical Expressiveness of Graph Neural Networks）

专知会员服务

41+阅读 · 2019年12月30日

【NeurlPS2019论文总结】一致收敛可能无法解释深度学习中的泛化现象，Uniform convergence may be unable to explain generalization in deep learning

【NeurlPS2019论文总结】一致收敛可能无法解释深度学习中的泛化现象，Uniform convergence may be unable to explain generalization in deep learning

专知会员服务

15+阅读 · 2019年12月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

相关资讯

神经网络高斯过程 (Neural Network Gaussian Process)

神经网络高斯过程 (Neural Network Gaussian Process)

PaperWeekly

0+阅读 · 2022年11月8日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

详解GAN的谱归一化（Spectral Normalization）

详解GAN的谱归一化（Spectral Normalization）

PaperWeekly

11+阅读 · 2019年2月13日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【CNN】一文读懂卷积神经网络CNN

【CNN】一文读懂卷积神经网络CNN

产业智能官

18+阅读 · 2018年1月2日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

相关论文

Generalized Neural Closure Models with Interpretability

Arxiv

0+阅读 · 2023年5月18日

A Measure of the Complexity of Neural Representations based on Partial Information Decomposition

Arxiv

0+阅读 · 2023年5月17日

Provably Correct Physics-Informed Neural Networks

Arxiv

0+阅读 · 2023年5月17日

A Note on Dimensionality Reduction in Deep Neural Networks using Empirical Interpolation Method

Arxiv

0+阅读 · 2023年5月16日

Content-Adaptive Downsampling in Convolutional Neural Networks

Arxiv

0+阅读 · 2023年5月16日

Output-sensitive Computation of Generalized Persistence Diagrams for 2-filtrations

Arxiv

0+阅读 · 2023年5月16日

Causal Analysis for Robust Interpretability of Neural Networks

Arxiv

0+阅读 · 2023年5月15日

Multi-task Learning of Order-Consistent Causal Graphs

Arxiv

10+阅读 · 2021年11月3日

Interpreting and Unifying Graph Neural Networks with An Optimization Framework

Arxiv

18+阅读 · 2021年1月28日

Learning with Interpretable Structure from RNN

Arxiv

20+阅读 · 2018年10月25日

相关基金

微孢子虫感染诱导的家蚕细胞凋亡抑制及其Serpins在抑制过程中的作用

国家自然科学基金

0+阅读 · 2014年12月31日

有限域上多项式的p-进与T-进指数和

国家自然科学基金

0+阅读 · 2013年12月31日

神经元凋亡时GSK-3/Egr-1上调PUMA的作用及其机制

国家自然科学基金

0+阅读 · 2013年12月31日

IL-32/Integrins/FAK通路在肝纤维化形成中的作用研究

国家自然科学基金

0+阅读 · 2013年12月31日

微尺度流动沸腾的不稳定性形成机理与非稳态液膜相变动力学

国家自然科学基金

0+阅读 · 2012年12月31日

随机微分方程概周期解和遍历解

国家自然科学基金

4+阅读 · 2011年12月31日

近空间可变气动外形飞行器鲁棒自适应一体化协调控制

国家自然科学基金

0+阅读 · 2009年12月31日

神经元凋亡时Egr1对BH3-only蛋白Bim的转录调控

国家自然科学基金

0+阅读 · 2009年12月31日

基于动力学分析的Internet网络拥塞控制研究

国家自然科学基金

0+阅读 · 2009年12月31日

组合导航系统中基于混沌、小波和神经网络的信息融合方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员