Eigen-componentwise convergence of SGD for quadratic programming - 专知论文

会员服务 ·

0

SGD · 奇异的 · 二次规划 · 优化器 · 可理解性 ·

2024 年 12 月 2 日

Eigen-componentwise convergence of SGD for quadratic programming

翻译：SGD在二次规划问题中的特征分量收敛性分析

Lehan Chen,Yuji Nakatsukasa

Stochastic gradient descent (SGD) is a workhorse algorithm for solving large-scale optimization problems in data science and machine learning. Understanding the convergence of SGD is hence of fundamental importance. In this work we examine the SGD convergence (with various step sizes) when applied to unconstrained convex quadratic programming (essentially least-squares (LS) problems), and in particular analyze the error components respect to the eigenvectors of the Hessian. The main message is that the convergence depends largely on the corresponding eigenvalues (singular values of the coefficient matrix in the LS context), namely the components for the large singular values converge faster in the initial phase. We then show there is a phase transition in the convergence where the convergence speed of the components, especially those corresponding to the larger singular values, will decrease. Finally, we show that the convergence of the overall error (in the solution) tends to decay as more iterations are run, that is, the initial convergence is faster than the asymptote.

翻译：随机梯度下降（SGD）是数据科学与机器学习领域解决大规模优化问题的核心算法，因此理解其收敛性具有根本重要性。本研究考察了SGD（采用多种步长设置）应用于无约束凸二次规划（本质上是最小二乘问题）时的收敛行为，特别针对Hessian矩阵特征向量方向的误差分量进行了分析。核心结论表明：收敛过程主要取决于对应的特征值（在最小二乘问题中即系数矩阵的奇异值），具体表现为较大奇异值对应的分量在迭代初期收敛更快。我们进一步揭示了收敛过程中存在相变现象，此时各分量（尤其是对应较大奇异值的分量）的收敛速度将显著下降。最后，我们证明随着迭代次数增加，整体解误差的收敛速率趋于衰减，即初始阶段的收敛速度快于渐近阶段。

0

相关内容

SGD

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【ACL2020】多模态信息抽取，365页ppt

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

【AI应用】Facebook-利用神经网络求解高等数学方程, Using neural networks to solve advanced mathematics equations

【AI应用】Facebook-利用神经网络求解高等数学方程, Using neural networks to solve advanced mathematics equations

专知会员服务

34+阅读 · 2020年1月15日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

斯坦福李飞飞高徒Johnson博士论文: 组成式计算机视觉智能,195页PDF

斯坦福李飞飞高徒Johnson博士论文: 组成式计算机视觉智能,195页PDF

专知会员服务

71+阅读 · 2019年10月27日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【AutoML】自动机器学习：最近进展研究综述 AutoML：A survey of State-of-the-art

【AutoML】自动机器学习：最近进展研究综述 AutoML：A survey of State-of-the-art

产业智能官

15+阅读 · 2019年8月13日

Github项目推荐 | 股市预测的机器学习/深度学习模型/资源集锦

Github项目推荐 | 股市预测的机器学习/深度学习模型/资源集锦

AI研习社

33+阅读 · 2019年4月18日

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

AI研习社

32+阅读 · 2019年4月5日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

论文浅尝 | 嵌入常识知识的注意力 LSTM 模型用于特定目标的基于侧面的情感分析

论文浅尝 | 嵌入常识知识的注意力 LSTM 模型用于特定目标的基于侧面的情感分析

开放知识图谱

28+阅读 · 2018年6月11日

Layer Normalization原理及其TensorFlow实现

Layer Normalization原理及其TensorFlow实现

深度学习每日摘要

32+阅读 · 2017年6月17日

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

不确定分数阶非线性系统Mittag-Leffler自适应控制

国家自然科学基金

1+阅读 · 2016年12月31日

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

2+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

随机二阶锥互补问题理论与算法研究及其应用

国家自然科学基金

0+阅读 · 2015年12月31日

哈密尔顿系统及多体问题的周期解

国家自然科学基金

0+阅读 · 2014年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

Biot模型基于有限元离散的多重网格算法研究

国家自然科学基金

1+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

概率抽样设计及其统计推断方法

国家自然科学基金

6+阅读 · 2014年12月31日

Geometric multimodal representation learning

Arxiv

69+阅读 · 2022年9月7日

Dynamic neighbourhood optimisation for task allocation using multi-agent

Arxiv

102+阅读 · 2022年5月11日

Explaining generalization in deep learning: progress and fundamental limits

Arxiv

10+阅读 · 2021年10月17日

Information-theoretic generalization bounds for black-box learning algorithms

Arxiv

12+阅读 · 2021年10月4日

Deep learning: a statistical viewpoint

Arxiv

18+阅读 · 2021年3月16日

Recent advances in deep learning theory

Recent advances in deep learning theory

Arxiv

52+阅读 · 2020年12月20日

A survey on deep hashing for image retrieval

A survey on deep hashing for image retrieval

Arxiv

15+阅读 · 2020年6月10日

Tensor Decompositions for temporal knowledge base completion

Arxiv

10+阅读 · 2020年4月10日

Optimization for deep learning: theory and algorithms

Optimization for deep learning: theory and algorithms

Arxiv

106+阅读 · 2019年12月19日

Hyper-SAGNN: a self-attention based graph neural network for hypergraphs

Hyper-SAGNN: a self-attention based graph neural network for hypergraphs

Arxiv

17+阅读 · 2019年11月6日

VIP会员

文章信息

相关主题

最新内容

《无人机对海面作战影响评估》

《无人机对海面作战影响评估》

专知会员服务

9+阅读 · 7月21日

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

专知会员服务

8+阅读 · 7月21日

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

专知会员服务

3+阅读 · 7月21日

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

专知会员服务

5+阅读 · 7月21日

五角大楼新设无人机办公室（DRPM-UxS）将如何重塑美国无人系统格局（附美国防部设立备忘录）

五角大楼新设无人机办公室（DRPM-UxS）将如何重塑美国无人系统格局（附美国防部设立备忘录）

专知会员服务

6+阅读 · 7月21日

印度精确打击与指挥架构的断层

印度精确打击与指挥架构的断层

专知会员服务

5+阅读 · 7月20日

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

专知会员服务

7+阅读 · 7月20日

美空军AI完成F-16战斗机自主空战历史性试飞

美空军AI完成F-16战斗机自主空战历史性试飞

专知会员服务

6+阅读 · 7月20日

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

专知会员服务

8+阅读 · 7月20日

《美国陆军：通过弹性分布式模型库实现自适应AI优势》

《美国陆军：通过弹性分布式模型库实现自适应AI优势》

专知会员服务

7+阅读 · 7月20日

博士论文 | 理解与改进大语言模型推理：从反转诅咒到连续思维链

博士论文 | 理解与改进大语言模型推理：从反转诅咒到连续思维链

专知会员服务

9+阅读 · 7月20日

综述 | 终身视觉表征：持续自监督学习CSSL系统综述

综述 | 终身视觉表征：持续自监督学习CSSL系统综述

专知会员服务

9+阅读 · 7月20日

深入Project Maven：为何人工智能在战场上依然失灵

深入Project Maven：为何人工智能在战场上依然失灵

专知会员服务

15+阅读 · 7月19日

锻造未来士兵：外骨骼、基因工程与赛博格

锻造未来士兵：外骨骼、基因工程与赛博格

专知会员服务

8+阅读 · 7月19日

《无人机系统（UAS）通信网状网络试验性部署》50页报告

《无人机系统（UAS）通信网状网络试验性部署》50页报告

专知会员服务

10+阅读 · 7月19日

相关VIP内容

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【ACL2020】多模态信息抽取，365页ppt

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

【AI应用】Facebook-利用神经网络求解高等数学方程, Using neural networks to solve advanced mathematics equations

【AI应用】Facebook-利用神经网络求解高等数学方程, Using neural networks to solve advanced mathematics equations

专知会员服务

34+阅读 · 2020年1月15日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

斯坦福李飞飞高徒Johnson博士论文: 组成式计算机视觉智能,195页PDF

斯坦福李飞飞高徒Johnson博士论文: 组成式计算机视觉智能,195页PDF

专知会员服务

71+阅读 · 2019年10月27日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

《无人机对海面作战影响评估》

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

相关资讯

【AutoML】自动机器学习：最近进展研究综述 AutoML：A survey of State-of-the-art

【AutoML】自动机器学习：最近进展研究综述 AutoML：A survey of State-of-the-art

产业智能官

15+阅读 · 2019年8月13日

Github项目推荐 | 股市预测的机器学习/深度学习模型/资源集锦

Github项目推荐 | 股市预测的机器学习/深度学习模型/资源集锦

AI研习社

33+阅读 · 2019年4月18日

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

AI研习社

32+阅读 · 2019年4月5日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

论文浅尝 | 嵌入常识知识的注意力 LSTM 模型用于特定目标的基于侧面的情感分析

论文浅尝 | 嵌入常识知识的注意力 LSTM 模型用于特定目标的基于侧面的情感分析

开放知识图谱

28+阅读 · 2018年6月11日

Layer Normalization原理及其TensorFlow实现

Layer Normalization原理及其TensorFlow实现

深度学习每日摘要

32+阅读 · 2017年6月17日

相关论文

Geometric multimodal representation learning

Arxiv

69+阅读 · 2022年9月7日

Dynamic neighbourhood optimisation for task allocation using multi-agent

Arxiv

102+阅读 · 2022年5月11日

Explaining generalization in deep learning: progress and fundamental limits

Arxiv

10+阅读 · 2021年10月17日

Information-theoretic generalization bounds for black-box learning algorithms

Arxiv

12+阅读 · 2021年10月4日

Deep learning: a statistical viewpoint

Arxiv

18+阅读 · 2021年3月16日

Recent advances in deep learning theory

Recent advances in deep learning theory

Arxiv

52+阅读 · 2020年12月20日

A survey on deep hashing for image retrieval

A survey on deep hashing for image retrieval

Arxiv

15+阅读 · 2020年6月10日

Tensor Decompositions for temporal knowledge base completion

Arxiv

10+阅读 · 2020年4月10日

Optimization for deep learning: theory and algorithms

Optimization for deep learning: theory and algorithms

Arxiv

106+阅读 · 2019年12月19日

Hyper-SAGNN: a self-attention based graph neural network for hypergraphs

Hyper-SAGNN: a self-attention based graph neural network for hypergraphs

Arxiv

17+阅读 · 2019年11月6日

相关基金

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

不确定分数阶非线性系统Mittag-Leffler自适应控制

国家自然科学基金

1+阅读 · 2016年12月31日

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

2+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

随机二阶锥互补问题理论与算法研究及其应用

国家自然科学基金

0+阅读 · 2015年12月31日

哈密尔顿系统及多体问题的周期解

国家自然科学基金

0+阅读 · 2014年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

Biot模型基于有限元离散的多重网格算法研究

国家自然科学基金

1+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

概率抽样设计及其统计推断方法

国家自然科学基金

6+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员