Convergence of Adam under Relaxed Assumptions - 专知论文

会员服务 ·

0

Adam · 平滑 · 优化器 · 矩 · 平稳的 ·

2023 年 6 月 5 日

Convergence of Adam under Relaxed Assumptions

翻译：宽松假设下Adam算法的收敛性

Haochuan Li,Alexander Rakhlin,Ali Jadbabaie

from arxiv, 33 pages

In this paper, we provide a rigorous proof of convergence of the Adaptive Moment Estimate (Adam) algorithm for a wide class of optimization objectives. Despite the popularity and efficiency of the Adam algorithm in training deep neural networks, its theoretical properties are not yet fully understood, and existing convergence proofs require unrealistically strong assumptions, such as globally bounded gradients, to show the convergence to stationary points. In this paper, we show that Adam provably converges to $\epsilon$-stationary points with $\mathcal{O}(\epsilon^{-4})$ gradient complexity under far more realistic conditions. The key to our analysis is a new proof of boundedness of gradients along the optimization trajectory of Adam, under a generalized smoothness assumption according to which the local smoothness (i.e., Hessian norm when it exists) is bounded by a sub-quadratic function of the gradient norm. Moreover, we propose a variance-reduced version of Adam with an accelerated gradient complexity of $\mathcal{O}(\epsilon^{-3})$.

翻译：本文给出了自适应矩估计(Adam)算法在广泛优化目标上的严格收敛性证明。尽管Adam算法在深度神经网络训练中具有广泛的应用性和高效性，但其理论性质尚未完全明确，且现有的收敛性证明需要全局有界梯度等不切实际的强假设，才能证明确切收敛至驻点。本文证明：在远更实际条件下，Adam算法能以$\mathcal{O}(\epsilon^{-4})$的梯度复杂度可靠收敛至$\epsilon$-驻点。分析的关键在于：在广义光滑性假设下（即局部光滑性（若存在Hessian范数时）受梯度范数的次二次函数约束），我们首次证明了Adam优化轨迹上梯度的有界性。此外，我们提出了方差缩减版本的Adam算法，其加速梯度复杂度达到$\mathcal{O}(\epsilon^{-3})$。

0

相关内容

Adam

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知会员服务

73+阅读 · 2022年7月11日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

106+阅读 · 2019年10月9日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

Numb在肾脏细胞自噬中的作用及其机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Heisenberg群与Minkowski空间中的非线性椭圆方程

国家自然科学基金

0+阅读 · 2014年12月31日

蓖麻矮化相关RcDof基因功能分析及调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

SCI后铁超载及其致Ferroptosis在白质继发损伤中的作用研究

国家自然科学基金

0+阅读 · 2014年12月31日

AlGaN/GaN MIS-HEMT器件在质子辐射下的退化机理，寿命预测模型与加固技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

热激蛋白CPN10在组蛋白转录调控中的作用和机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Internet环境下组合式软件的时空进程代数刻画及模型检测

国家自然科学基金

0+阅读 · 2012年12月31日

医疗服务中的资源调度与优化

国家自然科学基金

4+阅读 · 2011年12月31日

仿射流形上的非线性分析

国家自然科学基金

0+阅读 · 2008年12月31日

Relationship between Batch Size and Number of Steps Needed for Nonconvex Optimization of Stochastic Gradient Descent using Armijo Line Search

Arxiv

1+阅读 · 2023年7月25日

On the Error-Reducing Properties of Superposition Codes

Arxiv

0+阅读 · 2023年7月25日

Approximations of dispersive PDEs in the presence of low-regularity randomness

Arxiv

0+阅读 · 2023年7月24日

Multi-dimensional summation-by-parts operators for general function spaces: Theory and construction

Arxiv

0+阅读 · 2023年7月23日

Convergence of SGD for Training Neural Networks with Sliced Wasserstein Losses

Arxiv

0+阅读 · 2023年7月21日

Bayesian taut splines for estimating the number of modes

Arxiv

0+阅读 · 2023年7月21日

Beyond Convergence: Identifiability of Machine Learning and Deep Learning Models

Arxiv

0+阅读 · 2023年7月21日

The activity-weight duality in feed forward neural networks: The geometric determinants of generalization

Arxiv

0+阅读 · 2023年7月21日

A Convergence Rate for Manifold Neural Networks

Arxiv

0+阅读 · 2023年7月20日

Convergence of Adam for Non-convex Objectives: Relaxed Hyperparameters and Non-ergodic Case

Arxiv

0+阅读 · 2023年7月20日

VIP会员

文章信息

相关主题

最新内容

《反无人机蜂群：有人-无人协同防御场景下的编队重构分析》

《反无人机蜂群：有人-无人协同防御场景下的编队重构分析》

专知会员服务

4+阅读 · 今天12:53

《史诗怒火/咆哮雄狮行动：针对伊朗空中战役的战略分析》68页智库报告

《史诗怒火/咆哮雄狮行动：针对伊朗空中战役的战略分析》68页智库报告

专知会员服务

3+阅读 · 今天12:39

“愈演愈烈的欺骗与干扰博弈”：无人机与人工智能背景下俄乌强化以无人机为核心的电子战

“愈演愈烈的欺骗与干扰博弈”：无人机与人工智能背景下俄乌强化以无人机为核心的电子战

专知会员服务

2+阅读 · 今天12:32

乌克兰纵深打击如何重塑俄罗斯的战略选择

乌克兰纵深打击如何重塑俄罗斯的战略选择

专知会员服务

1+阅读 · 今天12:25

《分布式太空任务对比分析与综合建模及仿真环境》120页

《分布式太空任务对比分析与综合建模及仿真环境》120页

专知会员服务

1+阅读 · 今天12:14

俄乌战争中关于中程打击无人机部署的经验启示

俄乌战争中关于中程打击无人机部署的经验启示

专知会员服务

0+阅读 · 今天12:08

《远程自主系统可扩展态势感知的解决方案》32页2026最新报告

《远程自主系统可扩展态势感知的解决方案》32页2026最新报告

专知会员服务

5+阅读 · 7月23日

《基于强化学习的自动化红队测试》

《基于强化学习的自动化红队测试》

专知会员服务

4+阅读 · 7月23日

《下一代无人机-卫星通信：人工智能创新与未来展望》32页长综述

《下一代无人机-卫星通信：人工智能创新与未来展望》32页长综述

专知会员服务

6+阅读 · 7月23日

“天降毒雾”：无人机如何使化学战重返乌克兰战场

“天降毒雾”：无人机如何使化学战重返乌克兰战场

专知会员服务

2+阅读 · 7月23日

伊朗不对称防空战略的演进

伊朗不对称防空战略的演进

专知会员服务

4+阅读 · 7月23日

对抗环境下超视距目标打击的情报支援

对抗环境下超视距目标打击的情报支援

专知会员服务

10+阅读 · 7月22日

《面向复杂地形下无人机跟踪地面机器人（UAV–UGV）的自适应多滤波器扩展卡尔曼滤波框架》

《面向复杂地形下无人机跟踪地面机器人（UAV–UGV）的自适应多滤波器扩展卡尔曼滤波框架》

专知会员服务

4+阅读 · 7月22日

纵深侦察：大规模作战行动中远程侦察与监视之迫切需求

纵深侦察：大规模作战行动中远程侦察与监视之迫切需求

专知会员服务

8+阅读 · 7月22日

共享认知，分布式研判：复杂行动中的美国空军指挥控制（万字长文）

共享认知，分布式研判：复杂行动中的美国空军指挥控制（万字长文）

专知会员服务

11+阅读 · 7月22日

相关VIP内容

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知会员服务

73+阅读 · 2022年7月11日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

106+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《史诗怒火/咆哮雄狮行动：针对伊朗空中战役的战略分析》68页智库报告

乌克兰纵深打击如何重塑俄罗斯的战略选择

《反无人机蜂群：有人-无人协同防御场景下的编队重构分析》

“愈演愈烈的欺骗与干扰博弈”：无人机与人工智能背景下俄乌强化以无人机为核心的电子战

相关资讯

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

相关论文

Relationship between Batch Size and Number of Steps Needed for Nonconvex Optimization of Stochastic Gradient Descent using Armijo Line Search

Arxiv

1+阅读 · 2023年7月25日

On the Error-Reducing Properties of Superposition Codes

Arxiv

0+阅读 · 2023年7月25日

Approximations of dispersive PDEs in the presence of low-regularity randomness

Arxiv

0+阅读 · 2023年7月24日

Multi-dimensional summation-by-parts operators for general function spaces: Theory and construction

Arxiv

0+阅读 · 2023年7月23日

Convergence of SGD for Training Neural Networks with Sliced Wasserstein Losses

Arxiv

0+阅读 · 2023年7月21日

Bayesian taut splines for estimating the number of modes

Arxiv

0+阅读 · 2023年7月21日

Beyond Convergence: Identifiability of Machine Learning and Deep Learning Models

Arxiv

0+阅读 · 2023年7月21日

The activity-weight duality in feed forward neural networks: The geometric determinants of generalization

Arxiv

0+阅读 · 2023年7月21日

A Convergence Rate for Manifold Neural Networks

Arxiv

0+阅读 · 2023年7月20日

Convergence of Adam for Non-convex Objectives: Relaxed Hyperparameters and Non-ergodic Case

Arxiv

0+阅读 · 2023年7月20日

相关基金

Numb在肾脏细胞自噬中的作用及其机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Heisenberg群与Minkowski空间中的非线性椭圆方程

国家自然科学基金

0+阅读 · 2014年12月31日

蓖麻矮化相关RcDof基因功能分析及调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

SCI后铁超载及其致Ferroptosis在白质继发损伤中的作用研究

国家自然科学基金

0+阅读 · 2014年12月31日

AlGaN/GaN MIS-HEMT器件在质子辐射下的退化机理，寿命预测模型与加固技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

热激蛋白CPN10在组蛋白转录调控中的作用和机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Internet环境下组合式软件的时空进程代数刻画及模型检测

国家自然科学基金

0+阅读 · 2012年12月31日

医疗服务中的资源调度与优化

国家自然科学基金

4+阅读 · 2011年12月31日

仿射流形上的非线性分析

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员