A Sharp Estimate on the Transient Time of Distributed Stochastic Gradient Descent - 专知论文

会员服务 ·

0

随机梯度下降 · INFORMS · 估计/估计量 · 优化器 · 可交换的 ·

2021 年 1 月 30 日

A Sharp Estimate on the Transient Time of Distributed Stochastic Gradient Descent

翻译：分布式沙粒渐变后裔的短暂时间的尖锐估计

Shi Pu,Alex Olshevsky,Ioannis Ch. Paschalidis

This paper is concerned with minimizing the average of $n$ cost functions over a network in which agents may communicate and exchange information with each other. We consider the setting where only noisy gradient information is available. To solve the problem, we study the distributed stochastic gradient descent (DSGD) method and perform a non-asymptotic convergence analysis. For strongly convex and smooth objective functions, DSGD asymptotically achieves the optimal network independent convergence rate compared to centralized stochastic gradient descent (SGD). Our main contribution is to characterize the transient time needed for DSGD to approach the asymptotic convergence rate, which we show behaves as $K_T=\mathcal{O}\left(\frac{n}{(1-\rho_w)^2}\right)$, where $1-\rho_w$ denotes the spectral gap of the mixing matrix. Moreover, we construct a "hard" optimization problem for which we show the transient time needed for DSGD to approach the asymptotic convergence rate is lower bounded by $\Omega \left(\frac{n}{(1-\rho_w)^2} \right)$, implying the sharpness of the obtained result. Numerical experiments demonstrate the tightness of the theoretical results.

翻译：本文关注的是将代理商可以相互沟通和交流信息的网络的成本平均值降到最低。我们考虑只提供噪音梯度信息的设置。为了解决问题, 我们研究分布式随机梯度下降法( DSGD ), 并进行非无症状的趋同分析。对于强烈的调和和和平稳客观功能, DSGD 将实现与集中式梯度下降(SGD ) 相比的最佳网络独立趋同率。我们的主要贡献是描述 DSGD 接近无症状趋同率所需的短暂时间, 我们显示其表现为 $K_ Támathcal{ O ⁇ ázleft (\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\

0

相关内容

随机梯度下降

随机梯度下降

随机梯度下降，按照数据生成分布抽取m个样本，通过计算他们梯度的平均值来更新梯度。

计算机理论顶会STOC 2021奖项出炉，滕尚华等华人学者获奖

专知会员服务

8+阅读 · 2021年7月22日

【经典书】计算理论导论，482页pdf

【经典书】计算理论导论，482页pdf

专知会员服务

87+阅读 · 2021年4月10日

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

专知会员服务

75+阅读 · 2021年1月10日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

52+阅读 · 2020年12月14日

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

专知会员服务

60+阅读 · 2020年11月21日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

254+阅读 · 2020年4月19日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【IJCAI 2019】人工智能中的认知推理（Epistemic reasoning in AI），法国雷恩François Schwarzentruber，Tristan Charrier

【IJCAI 2019】人工智能中的认知推理（Epistemic reasoning in AI），法国雷恩François Schwarzentruber，Tristan Charrier

专知会员服务

22+阅读 · 2019年8月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

【OpenAI】深度强化学习关键论文列表

【OpenAI】深度强化学习关键论文列表

专知

12+阅读 · 2018年11月10日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】免费书(草稿)：数据科学的数学基础

【推荐】免费书(草稿)：数据科学的数学基础

机器学习研究会

20+阅读 · 2017年10月1日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

On the Hyperparameters in Stochastic Gradient Descent with Momentum

Arxiv

0+阅读 · 2021年8月9日

Limit Distribution Theory for the Smooth 1-Wasserstein Distance with Applications

Arxiv

0+阅读 · 2021年8月8日

Approximation of SDEs -- a stochastic sewing approach

Arxiv

0+阅读 · 2021年8月8日

On the Convergence Rate of Projected Gradient Descent for a Back-Projection based Objective

Arxiv

0+阅读 · 2021年8月8日

Faster Rates of Differentially Private Stochastic Convex Optimization

Arxiv

0+阅读 · 2021年8月7日

Iterative Pre-Conditioning for Expediting the Gradient-Descent Method: The Distributed Linear Least-Squares Problem

Arxiv

0+阅读 · 2021年8月6日

Toward Efficient Online Scheduling for Distributed Machine Learning Systems

Arxiv

0+阅读 · 2021年8月6日

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

Arxiv

4+阅读 · 2019年5月9日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Arxiv

7+阅读 · 2018年6月1日

VIP会员

文章信息

相关主题

随机梯度下降

估计/估计量

最新内容

ICML 2026 | VOTP：用视频基础模型与最优传输，让离线偏好强化学习只需少量反馈

ICML 2026 | VOTP：用视频基础模型与最优传输，让离线偏好强化学习只需少量反馈

专知会员服务

0+阅读 · 31分钟前

多模态代码智能综述：从视觉输入到可执行代码系统

多模态代码智能综述：从视觉输入到可执行代码系统

专知会员服务

0+阅读 · 33分钟前

美国马六甲“三重网”概念：安全网、威慑网与杀伤网

美国马六甲“三重网”概念：安全网、威慑网与杀伤网

专知会员服务

3+阅读 · 今天8:18

《面向导弹有效发射时机的监督机器学习方法：基于超视距空战仿真》

《面向导弹有效发射时机的监督机器学习方法：基于超视距空战仿真》

专知会员服务

3+阅读 · 今天7:39

《通用大语言模型：无人机指挥与控制接口》最新40页

《通用大语言模型：无人机指挥与控制接口》最新40页

专知会员服务

9+阅读 · 今天7:33

《通过小型无人机系统将情报能力“作战化”》

《通过小型无人机系统将情报能力“作战化”》

专知会员服务

3+阅读 · 今天7:28

《神经安全型有人–无人协同：面向认知自适应作战能力的参考架构》

《神经安全型有人–无人协同：面向认知自适应作战能力的参考架构》

专知会员服务

6+阅读 · 今天7:14

《在指挥链中通过多准则决策分析传达指挥官意图：空战实验》

《在指挥链中通过多准则决策分析传达指挥官意图：空战实验》

专知会员服务

18+阅读 · 6月15日

消耗优势：美军的“精确规模化”概念

消耗优势：美军的“精确规模化”概念

专知会员服务

7+阅读 · 6月15日

五角大楼的AI优先战略及其对现代战争的启示：来自与伊朗冲突的经验教训

五角大楼的AI优先战略及其对现代战争的启示：来自与伊朗冲突的经验教训

专知会员服务

8+阅读 · 6月15日

《网络空间兵棋推演：挑战、局限性与混合路径》报告

《网络空间兵棋推演：挑战、局限性与混合路径》报告

专知会员服务

8+阅读 · 6月15日

《离线语言支持系统：面向空战战术决策》

《离线语言支持系统：面向空战战术决策》

专知会员服务

8+阅读 · 6月15日

《以通信为中心的6G–LLM架构：面向可扩展的战术自主防御车辆网络》

《以通信为中心的6G–LLM架构：面向可扩展的战术自主防御车辆网络》

专知会员服务

7+阅读 · 6月15日

ICML 2026｜ECA：面向开放式图文生成的高效持续对齐

ICML 2026｜ECA：面向开放式图文生成的高效持续对齐

专知会员服务

6+阅读 · 6月14日

可信智能体AI综述：安全、鲁棒性、隐私与系统安全

可信智能体AI综述：安全、鲁棒性、隐私与系统安全

专知会员服务

6+阅读 · 6月14日

相关VIP内容

计算机理论顶会STOC 2021奖项出炉，滕尚华等华人学者获奖

专知会员服务

8+阅读 · 2021年7月22日

【经典书】计算理论导论，482页pdf

【经典书】计算理论导论，482页pdf

专知会员服务

87+阅读 · 2021年4月10日

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

专知会员服务

75+阅读 · 2021年1月10日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

52+阅读 · 2020年12月14日

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

专知会员服务

60+阅读 · 2020年11月21日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

254+阅读 · 2020年4月19日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【IJCAI 2019】人工智能中的认知推理（Epistemic reasoning in AI），法国雷恩François Schwarzentruber，Tristan Charrier

【IJCAI 2019】人工智能中的认知推理（Epistemic reasoning in AI），法国雷恩François Schwarzentruber，Tristan Charrier

专知会员服务

22+阅读 · 2019年8月10日

热门VIP内容

开通专知VIP会员享更多权益服务

多模态代码智能综述：从视觉输入到可执行代码系统

《面向导弹有效发射时机的监督机器学习方法：基于超视距空战仿真》

ICML 2026 | VOTP：用视频基础模型与最优传输，让离线偏好强化学习只需少量反馈

美国马六甲“三重网”概念：安全网、威慑网与杀伤网

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

【OpenAI】深度强化学习关键论文列表

【OpenAI】深度强化学习关键论文列表

专知

12+阅读 · 2018年11月10日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】免费书(草稿)：数据科学的数学基础

【推荐】免费书(草稿)：数据科学的数学基础

机器学习研究会

20+阅读 · 2017年10月1日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

On the Hyperparameters in Stochastic Gradient Descent with Momentum

Arxiv

0+阅读 · 2021年8月9日

Limit Distribution Theory for the Smooth 1-Wasserstein Distance with Applications

Arxiv

0+阅读 · 2021年8月8日

Approximation of SDEs -- a stochastic sewing approach

Arxiv

0+阅读 · 2021年8月8日

On the Convergence Rate of Projected Gradient Descent for a Back-Projection based Objective

Arxiv

0+阅读 · 2021年8月8日

Faster Rates of Differentially Private Stochastic Convex Optimization

Arxiv

0+阅读 · 2021年8月7日

Iterative Pre-Conditioning for Expediting the Gradient-Descent Method: The Distributed Linear Least-Squares Problem

Arxiv

0+阅读 · 2021年8月6日

Toward Efficient Online Scheduling for Distributed Machine Learning Systems

Arxiv

0+阅读 · 2021年8月6日

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

Arxiv

4+阅读 · 2019年5月9日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Arxiv

7+阅读 · 2018年6月1日

微信扫码咨询专知VIP会员