Stochastic Collapse: How Gradient Noise Attracts SGD Dynamics Towards Simpler Subnetworks - 专知论文

会员服务 ·

0

SGD · 不变 · 泛化理论 · 噪声 · Analysis ·

2023 年 6 月 7 日

Stochastic Collapse: How Gradient Noise Attracts SGD Dynamics Towards Simpler Subnetworks

翻译：随机坍缩：梯度噪声如何引导SGD动力学趋向更简子网络

Feng Chen,Daniel Kunin,Atsushi Yamamura,Surya Ganguli

from arxiv, 30 pages, 10 figures

In this work, we reveal a strong implicit bias of stochastic gradient descent (SGD) that drives overly expressive networks to much simpler subnetworks, thereby dramatically reducing the number of independent parameters, and improving generalization. To reveal this bias, we identify invariant sets, or subsets of parameter space that remain unmodified by SGD. We focus on two classes of invariant sets that correspond to simpler subnetworks and commonly appear in modern architectures. Our analysis uncovers that SGD exhibits a property of stochastic attractivity towards these simpler invariant sets. We establish a sufficient condition for stochastic attractivity based on a competition between the loss landscape's curvature around the invariant set and the noise introduced by stochastic gradients. Remarkably, we find that an increased level of noise strengthens attractivity, leading to the emergence of attractive invariant sets associated with saddle-points or local maxima of the train loss. We observe empirically the existence of attractive invariant sets in trained deep neural networks, implying that SGD dynamics often collapses to simple subnetworks with either vanishing or redundant neurons. We further demonstrate how this simplifying process of stochastic collapse benefits generalization in a linear teacher-student framework. Finally, through this analysis, we mechanistically explain why early training with large learning rates for extended periods benefits subsequent generalization.

翻译：本文揭示了随机梯度下降（SGD）的一种强隐式偏差，促使过度表达的网络收敛至更简子网络，从而显著减少独立参数数量并提升泛化能力。为揭示此偏差，我们识别出SGD更新过程中保持不变的不变集（参数空间的子集），聚焦于两类与更简子网络对应且常见于现代架构的不变集。分析表明，SGD对这些更简不变集具有随机吸引性。我们基于损失景观在不变集周围的曲率与随机梯度引入的噪声之间的竞争关系，建立了随机吸引性的充分条件。值得注意的是，噪声水平的增强会强化吸引性，导致与训练损失鞍点或局部最大值相关联的吸引性不变集出现。实验观测表明，训练深度神经网络中存在吸引性不变集，这意味着SGD动力学常坍缩至含有消失或冗余神经元的简单子网络。我们进一步在线性教师-学生框架下证明，这种随机坍缩的简化过程如何促进泛化。最后，通过该分析，我们从机理层面解释了早期采用大学习率进行长时训练为何有利于后续泛化。

0

相关内容

SGD

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

52+阅读 · 2020年12月14日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

锁模运转的低阈值掺铥激光器

国家自然科学基金

0+阅读 · 2014年12月31日

DACI1 调控Cyt b6/f 复合物组装的功能研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于掺镱光纤宽带光源泵浦产生的超连续谱研究

国家自然科学基金

0+阅读 · 2013年12月31日

AhR和VitA/RA途径在二噁英诱发腭裂中的相互作用

国家自然科学基金

0+阅读 · 2013年12月31日

MicRNA107调控BACE1mRNA基因与阿尔茨海默病内质网应激病理机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

新兴市场国家IFRS制定过程中的博弈及经济后果研究

国家自然科学基金

1+阅读 · 2012年12月31日

保守振动方程周期解的存在性研究

国家自然科学基金

0+阅读 · 2012年12月31日

Rydberg Blockade条件下的量子相干与量子信息处理的研究

国家自然科学基金

0+阅读 · 2012年12月31日

Bose-Hubbard模型量子相变的数值研究

国家自然科学基金

0+阅读 · 2011年12月31日

非线性约束预测控制研究及其在电站节能控制中的应用

国家自然科学基金

0+阅读 · 2009年12月31日

The Marginal Value of Momentum for Small Learning Rate SGD

Arxiv

0+阅读 · 2023年7月27日

Dynamics of specialization in neural modules under resource constraints

Arxiv

0+阅读 · 2023年7月27日

Prediction of wind turbines power with physics-informed neural networks and evidential uncertainty quantification

Arxiv

0+阅读 · 2023年7月27日

On the Learning Dynamics of Attention Networks

Arxiv

0+阅读 · 2023年7月26日

The Information Bottleneck's Ordinary Differential Equation: First-Order Root-Tracking for the IB

Arxiv

0+阅读 · 2023年7月25日

A Survey of Quantization Methods for Efficient Neural Network Inference

Arxiv

22+阅读 · 2021年6月21日

Scaling Properties of Deep Residual Networks

Arxiv

13+阅读 · 2021年5月25日

Bayesian Deep Learning via Subnetwork Inference

Arxiv

10+阅读 · 2021年2月18日

Subgraph Neural Networks

Arxiv

27+阅读 · 2020年6月19日

How Powerful are Graph Neural Networks?

Arxiv

23+阅读 · 2018年10月1日

VIP会员

文章信息

相关主题

最新内容

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

专知会员服务

5+阅读 · 今天8:00

重新思考无人机时代的生存能力

重新思考无人机时代的生存能力

专知会员服务

3+阅读 · 今天7:44

装甲突击旅：现代战争思考、战斗与组织

装甲突击旅：现代战争思考、战斗与组织

专知会员服务

3+阅读 · 今天7:28

在人工智能加速决策环境中拓展OODA循环

在人工智能加速决策环境中拓展OODA循环

专知会员服务

4+阅读 · 今天7:18

《廉价自杀式无人机战争的军事战略影响：乌克兰与伊朗案例研究》

《廉价自杀式无人机战争的军事战略影响：乌克兰与伊朗案例研究》

专知会员服务

5+阅读 · 今天7:07

军事欺骗：供作战战术指挥官使用的工具

军事欺骗：供作战战术指挥官使用的工具

专知会员服务

4+阅读 · 今天7:03

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

专知会员服务

4+阅读 · 6月23日

综述 | 世界动作模型：少做梦，多行动

综述 | 世界动作模型：少做梦，多行动

专知会员服务

5+阅读 · 6月23日

美以伊冲突：无人机与人工智能的运用

美以伊冲突：无人机与人工智能的运用

专知会员服务

10+阅读 · 6月23日

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

专知会员服务

4+阅读 · 6月23日

《特种部队在透明战场中的生存力》最新报告

《特种部队在透明战场中的生存力》最新报告

专知会员服务

5+阅读 · 6月23日

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

专知会员服务

8+阅读 · 6月23日

《人工智能生成的零日漏洞：对未来作战的影响》

《人工智能生成的零日漏洞：对未来作战的影响》

专知会员服务

7+阅读 · 6月23日

《理解伙伴国在防务能力选择中的偏好：探索美国解决方案的替代选择》美智库200页报告

《理解伙伴国在防务能力选择中的偏好：探索美国解决方案的替代选择》美智库200页报告

专知会员服务

4+阅读 · 6月23日

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

专知会员服务

6+阅读 · 6月22日

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

52+阅读 · 2020年12月14日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

重新思考无人机时代的生存能力

在人工智能加速决策环境中拓展OODA循环

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

装甲突击旅：现代战争思考、战斗与组织

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

相关论文

The Marginal Value of Momentum for Small Learning Rate SGD

Arxiv

0+阅读 · 2023年7月27日

Dynamics of specialization in neural modules under resource constraints

Arxiv

0+阅读 · 2023年7月27日

Prediction of wind turbines power with physics-informed neural networks and evidential uncertainty quantification

Arxiv

0+阅读 · 2023年7月27日

On the Learning Dynamics of Attention Networks

Arxiv

0+阅读 · 2023年7月26日

The Information Bottleneck's Ordinary Differential Equation: First-Order Root-Tracking for the IB

Arxiv

0+阅读 · 2023年7月25日

A Survey of Quantization Methods for Efficient Neural Network Inference

Arxiv

22+阅读 · 2021年6月21日

Scaling Properties of Deep Residual Networks

Arxiv

13+阅读 · 2021年5月25日

Bayesian Deep Learning via Subnetwork Inference

Arxiv

10+阅读 · 2021年2月18日

Subgraph Neural Networks

Arxiv

27+阅读 · 2020年6月19日

How Powerful are Graph Neural Networks?

Arxiv

23+阅读 · 2018年10月1日

相关基金

锁模运转的低阈值掺铥激光器

国家自然科学基金

0+阅读 · 2014年12月31日

DACI1 调控Cyt b6/f 复合物组装的功能研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于掺镱光纤宽带光源泵浦产生的超连续谱研究

国家自然科学基金

0+阅读 · 2013年12月31日

AhR和VitA/RA途径在二噁英诱发腭裂中的相互作用

国家自然科学基金

0+阅读 · 2013年12月31日

MicRNA107调控BACE1mRNA基因与阿尔茨海默病内质网应激病理机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

新兴市场国家IFRS制定过程中的博弈及经济后果研究

国家自然科学基金

1+阅读 · 2012年12月31日

保守振动方程周期解的存在性研究

国家自然科学基金

0+阅读 · 2012年12月31日

Rydberg Blockade条件下的量子相干与量子信息处理的研究

国家自然科学基金

0+阅读 · 2012年12月31日

Bose-Hubbard模型量子相变的数值研究

国家自然科学基金

0+阅读 · 2011年12月31日

非线性约束预测控制研究及其在电站节能控制中的应用

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员