Improving the Adaptive Moment Estimation (ADAM) stochastic optimizer through an Implicit-Explicit (IMEX) time-stepping approach - 专知论文

会员服务 ·

0

Adam · 优化器 · 估计/估计量 · 矩 · 离散化 ·

2024 年 3 月 20 日

Improving the Adaptive Moment Estimation (ADAM) stochastic optimizer through an Implicit-Explicit (IMEX) time-stepping approach

翻译：改进自适应矩估计（ADAM）随机优化器的隐式-显式（IMEX）时间步进方法

Abhinab Bhattacharjee,Andrey A. Popov,Arash Sarshar,Adrian Sandu

The Adam optimizer, often used in Machine Learning for neural network training, corresponds to an underlying ordinary differential equation (ODE) in the limit of very small learning rates. This work shows that the classical Adam algorithm is a first order implicit-explicit (IMEX) Euler discretization of the underlying ODE. Employing the time discretization point of view, we propose new extensions of the Adam scheme obtained by using higher order IMEX methods to solve the ODE. Based on this approach, we derive a new optimization algorithm for neural network training that performs better than classical Adam on several regression and classification problems.

翻译：在机器学习中常用于神经网络训练的Adam优化器，在极小学惯率极限下对应一个常微分方程（ODE）。本文证明经典Adam算法是底层ODE的一阶隐式-显式（IMEX）欧拉离散化。基于时间离散化视角，我们提出利用高阶IMEX方法求解ODE来扩展Adam方案的新方法。基于这一方法，我们推导出新的神经网络训练优化算法，其在多个回归与分类问题上的表现优于经典Adam。

0

相关内容

Adam

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

From Softmax to Sparsemax-ICML16（1）

From Softmax to Sparsemax-ICML16（1）

KingsGarden

74+阅读 · 2016年11月26日

城市“建成环境——空间行为”的多尺度影响关系与机理研究

国家自然科学基金

13+阅读 · 2017年12月31日

“Fishes-in-net” 酵母孢子微胶囊式近平滑假丝酵母SCRII酶有机相高效手性合成机制研究

国家自然科学基金

3+阅读 · 2016年12月31日

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

2+阅读 · 2015年12月31日

Volterra积分微分方程的多区间Chebyshev和Legendre谱配置法

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于决策模型和预备电位的运动想象BCI研究

国家自然科学基金

3+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

海量Web用户生成内容物化关键技术

国家自然科学基金

2+阅读 · 2014年12月31日

NF-kB/MyD88调控巨噬细胞亚型转化在MS/GBS发病机制中的作用及潜在临床应用

国家自然科学基金

0+阅读 · 2014年12月31日

Imitation Learning in Discounted Linear MDPs without exploration assumptions

Arxiv

0+阅读 · 2024年5月3日

SoftMCL: Soft Momentum Contrastive Learning for Fine-grained Sentiment-aware Pre-training

Arxiv

0+阅读 · 2024年5月3日

Potential Energy based Mixture Model for Noisy Label Learning

Arxiv

0+阅读 · 2024年5月2日

On Ridge Estimation in High-dimensional Rotationally Sparse Linear Regression

Arxiv

0+阅读 · 2024年5月2日

Causal Inference with High-dimensional Discrete Covariates

Arxiv

0+阅读 · 2024年4月30日

Unravelling the Performance of Physics-informed Graph Neural Networks for Dynamical Systems

Arxiv

13+阅读 · 2022年11月10日

VLP: A Survey on Vision-Language Pre-training

VLP: A Survey on Vision-Language Pre-training

Arxiv

11+阅读 · 2022年2月21日

Self-supervised Heterogeneous Graph Neural Network with Co-contrastive Learning

Arxiv

15+阅读 · 2021年5月19日

Beyond Low-frequency Information in Graph Convolutional Networks

Arxiv

14+阅读 · 2021年1月4日

Hyperparameter Ensembles for Robustness and Uncertainty Quantification

Arxiv

12+阅读 · 2020年6月24日

VIP会员

文章信息

相关主题

估计/估计量

最新内容

【伯克利博士论文】基于动作分块策略的强化学习

【伯克利博士论文】基于动作分块策略的强化学习

专知会员服务

1+阅读 · 46分钟前

Transformer增强强化学习：通信网络基础与应用综述

Transformer增强强化学习：通信网络基础与应用综述

专知会员服务

1+阅读 · 49分钟前

ICML 2026 | SARDI：扩散语言模型的自增强检索

ICML 2026 | SARDI：扩散语言模型的自增强检索

专知会员服务

5+阅读 · 6月6日

长时程具身智能安全综述：机器人操作的跨层分析

长时程具身智能安全综述：机器人操作的跨层分析

专知会员服务

7+阅读 · 6月6日

从“杀伤链”到“杀伤网”：新时代防空反导体系的真正需求

从“杀伤链”到“杀伤网”：新时代防空反导体系的真正需求

专知会员服务

12+阅读 · 6月6日

《锻造军官能力：军官发展的军事训练、学术教育及设计思维导向创新的多维度研究》最新300页

《锻造军官能力：军官发展的军事训练、学术教育及设计思维导向创新的多维度研究》最新300页

专知会员服务

7+阅读 · 6月6日

《国防领域安全采用大语言模型的战略蓝图》

《国防领域安全采用大语言模型的战略蓝图》

专知会员服务

9+阅读 · 6月6日

《对抗性电磁环境下远程巡飞弹作战的保密指挥控制数据链》

《对抗性电磁环境下远程巡飞弹作战的保密指挥控制数据链》

专知会员服务

9+阅读 · 6月6日

CVPR2026奖项公布，谷歌D4RT最佳论文获奖，何恺明ResNet、YOLO获时间检验奖！

CVPR2026奖项公布，谷歌D4RT最佳论文获奖，何恺明ResNet、YOLO获时间检验奖！

专知会员服务

7+阅读 · 6月6日

ICML 2026 | 演化选择的因果建模

ICML 2026 | 演化选择的因果建模

专知会员服务

9+阅读 · 6月5日

综述｜学习式3D表征最新进展与趋势

综述｜学习式3D表征最新进展与趋势

专知会员服务

7+阅读 · 6月5日

《武器作战效能分析：基于虚拟构造仿真大数据与深度学习的初步见解》

《武器作战效能分析：基于虚拟构造仿真大数据与深度学习的初步见解》

专知会员服务

10+阅读 · 6月5日

《自主巡飞弹药系统量子逻辑框架：一种基于不确定模糊集的方法》

《自主巡飞弹药系统量子逻辑框架：一种基于不确定模糊集的方法》

专知会员服务

7+阅读 · 6月5日

人工智能重塑威慑：算法优势的兴起

人工智能重塑威慑：算法优势的兴起

专知会员服务

9+阅读 · 6月5日

【博士论文】基于物理结构与贝叶斯不确定性的可靠神经网络

【博士论文】基于物理结构与贝叶斯不确定性的可靠神经网络

专知会员服务

14+阅读 · 6月4日

相关VIP内容

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

Transformer增强强化学习：通信网络基础与应用综述

长时程具身智能安全综述：机器人操作的跨层分析

【伯克利博士论文】基于动作分块策略的强化学习

ICML 2026 | SARDI：扩散语言模型的自增强检索

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

From Softmax to Sparsemax-ICML16（1）

From Softmax to Sparsemax-ICML16（1）

KingsGarden

74+阅读 · 2016年11月26日

相关论文

Imitation Learning in Discounted Linear MDPs without exploration assumptions

Arxiv

0+阅读 · 2024年5月3日

SoftMCL: Soft Momentum Contrastive Learning for Fine-grained Sentiment-aware Pre-training

Arxiv

0+阅读 · 2024年5月3日

Potential Energy based Mixture Model for Noisy Label Learning

Arxiv

0+阅读 · 2024年5月2日

On Ridge Estimation in High-dimensional Rotationally Sparse Linear Regression

Arxiv

0+阅读 · 2024年5月2日

Causal Inference with High-dimensional Discrete Covariates

Arxiv

0+阅读 · 2024年4月30日

Unravelling the Performance of Physics-informed Graph Neural Networks for Dynamical Systems

Arxiv

13+阅读 · 2022年11月10日

VLP: A Survey on Vision-Language Pre-training

VLP: A Survey on Vision-Language Pre-training

Arxiv

11+阅读 · 2022年2月21日

Self-supervised Heterogeneous Graph Neural Network with Co-contrastive Learning

Arxiv

15+阅读 · 2021年5月19日

Beyond Low-frequency Information in Graph Convolutional Networks

Arxiv

14+阅读 · 2021年1月4日

Hyperparameter Ensembles for Robustness and Uncertainty Quantification

Arxiv

12+阅读 · 2020年6月24日

相关基金

城市“建成环境——空间行为”的多尺度影响关系与机理研究

国家自然科学基金

13+阅读 · 2017年12月31日

“Fishes-in-net” 酵母孢子微胶囊式近平滑假丝酵母SCRII酶有机相高效手性合成机制研究

国家自然科学基金

3+阅读 · 2016年12月31日

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

2+阅读 · 2015年12月31日

Volterra积分微分方程的多区间Chebyshev和Legendre谱配置法

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于决策模型和预备电位的运动想象BCI研究

国家自然科学基金

3+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

海量Web用户生成内容物化关键技术

国家自然科学基金

2+阅读 · 2014年12月31日

NF-kB/MyD88调控巨噬细胞亚型转化在MS/GBS发病机制中的作用及潜在临床应用

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员