Algorithmic Stability of Heavy-Tailed SGD with General Loss Functions - 专知论文

会员服务 ·

0

泛化理论 · SGD · 泛函 · 损失函数（机器学习） · 泛化误差 ·

2023 年 1 月 30 日

Algorithmic Stability of Heavy-Tailed SGD with General Loss Functions

翻译：基于一般损失函数的重尾SGD算法稳定性

Anant Raj,Lingjiong Zhu,Mert Gürbüzbalaban,Umut Şimşekli

from arxiv, The first two authors contributed equally to this work

Heavy-tail phenomena in stochastic gradient descent (SGD) have been reported in several empirical studies. Experimental evidence in previous works suggests a strong interplay between the heaviness of the tails and generalization behavior of SGD. To address this empirical phenomena theoretically, several works have made strong topological and statistical assumptions to link the generalization error to heavy tails. Very recently, new generalization bounds have been proven, indicating a non-monotonic relationship between the generalization error and heavy tails, which is more pertinent to the reported empirical observations. While these bounds do not require additional topological assumptions given that SGD can be modeled using a heavy-tailed stochastic differential equation (SDE), they can only apply to simple quadratic problems. In this paper, we build on this line of research and develop generalization bounds for a more general class of objective functions, which includes non-convex functions as well. Our approach is based on developing Wasserstein stability bounds for heavy-tailed SDEs and their discretizations, which we then convert to generalization bounds. Our results do not require any nontrivial assumptions; yet, they shed more light to the empirical observations, thanks to the generality of the loss functions.

翻译：随机梯度下降（SGD）中的重尾现象已在多项实证研究中被报道。先前工作的实验证据表明，尾部的重尾程度与SGD的泛化行为之间存在强烈的相互作用。为从理论上解释这一实证现象，已有若干工作通过引入强拓扑和统计假设来建立泛化误差与重尾之间的联系。最近，新的泛化界被证明，表明泛化误差与重尾之间呈非单调关系，这与已报道的实证观察更为一致。尽管在SGD可被建模为重尾随机微分方程（SDE）的条件下，这些界无需额外的拓扑假设，但它们仅适用于简单的二次问题。本文在此基础上，针对更一般的目标函数类别（包括非凸函数）发展泛化界。我们的方法基于为重尾SDE及其离散化建立Wasserstein稳定性界，进而将其转化为泛化界。我们的结果无需任何非平凡假设；然而，得益于损失函数的普适性，这些结果对实证观察提供了更深入的阐释。

0

相关内容

泛化理论

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【开放书】卡耐基梅隆大学Elaine Shi 教授《Foundations of Distributed Consensus and Blockchains（分布式共识和区块链的基础）》150页pdf

【开放书】卡耐基梅隆大学Elaine Shi 教授《Foundations of Distributed Consensus and Blockchains（分布式共识和区块链的基础）》150页pdf

专知会员服务

30+阅读 · 2022年2月22日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

80+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

106+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Cab45S和RCN1调控细胞增殖和凋亡的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

各向异性非球形粒子对厄米高斯波束的散射及扭矩研究

国家自然科学基金

0+阅读 · 2013年12月31日

微纳结构铌酸锂波导高速电光调制器

国家自然科学基金

0+阅读 · 2013年12月31日

卤化物—二氧化硅纳米复合结构双功能薄膜的结构调控和发光特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

关于AI-半环簇与 Conway半环簇的研究

国家自然科学基金

1+阅读 · 2012年12月31日

组蛋白去乙酰化酶抑制剂对骨关节炎中Notch-NFAT信号通路调控的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Pt/(CdS－CdSe)共修饰TiO2纳米管阵列薄膜的制备与光催化性能研究

国家自然科学基金

0+阅读 · 2011年12月31日

活性氧和一氧化氮介导的胎盘内质网应激在LPS致畸中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

瘢痕疙瘩中TIEG1对Smad7转录调控的研究

国家自然科学基金

0+阅读 · 2009年12月31日

钙钛矿结构钒基氧化物光诱导效应研究

国家自然科学基金

0+阅读 · 2009年12月31日

On lower bounds for the bias-variance trade-off

Arxiv

0+阅读 · 2023年3月20日

On the Maximal Independent Sets of $k$-mers with the Edit Distance

Arxiv

0+阅读 · 2023年3月20日

Lower Generalization Bounds for GD and SGD in Smooth Stochastic Convex Optimization

Arxiv

0+阅读 · 2023年3月19日

Mean-square convergence rates of implicit Milstein type methods for SDEs with non-Lipschitz coefficients

Arxiv

0+阅读 · 2023年3月19日

Weak convergence of the backward Euler method for stochastic Cahn--Hilliard equation with additive noise

Arxiv

0+阅读 · 2023年3月19日

Push--Pull with Device Sampling

Arxiv

0+阅读 · 2023年3月17日

User Selection for Simple Passive Beamforming in Multi-RIS-Aided Multi-User Communications

Arxiv

0+阅读 · 2023年3月16日

Well-classified Examples are Underestimated in Classification with Deep Neural Networks

Arxiv

0+阅读 · 2023年3月16日

Noisy Low-rank Matrix Optimization: Geometry of Local Minima and Convergence Rate

Arxiv

0+阅读 · 2023年3月15日

Trustworthy AI: From Principles to Practices

Arxiv

46+阅读 · 2021年10月4日

VIP会员

文章信息

相关主题

损失函数（机器学习）

最新内容

论文解读 | 医学图像修复中的扩散模型：挑战、分类与未来方向

论文解读 | 医学图像修复中的扩散模型：挑战、分类与未来方向

专知会员服务

0+阅读 · 今天14:48

博士论文 | 从算法到基础模型：强化学习的统一视角

博士论文 | 从算法到基础模型：强化学习的统一视角

专知会员服务

0+阅读 · 今天14:46

面向国防作战的最佳自主与蜂群无人机技术

面向国防作战的最佳自主与蜂群无人机技术

专知会员服务

4+阅读 · 今天8:04

《异构人类团队的协作决策过程混合建模研究》

《异构人类团队的协作决策过程混合建模研究》

专知会员服务

4+阅读 · 今天7:59

《C5ISR系统中的注意力动态与自适应决策支持研究：视觉与多模态注意力引导对任务绩效影响的递归量化分析》最新36页报告

《C5ISR系统中的注意力动态与自适应决策支持研究：视觉与多模态注意力引导对任务绩效影响的递归量化分析》最新36页报告

专知会员服务

4+阅读 · 今天7:56

《设计思维中的人机协作：生成式人工智能对共情访谈影响的探究》140页

《设计思维中的人机协作：生成式人工智能对共情访谈影响的探究》140页

专知会员服务

4+阅读 · 今天7:50

博士论文 | 面向大模型推理的内存高效算法

博士论文 | 面向大模型推理的内存高效算法

专知会员服务

4+阅读 · 7月27日

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

专知会员服务

6+阅读 · 7月27日

《无人系统互操作性导论——无人系统联合架构（JAUS）》

《无人系统互操作性导论——无人系统联合架构（JAUS）》

专知会员服务

13+阅读 · 7月27日

美空军新型反无人机部队初探

美空军新型反无人机部队初探

专知会员服务

7+阅读 · 7月27日

《对抗性电磁环境下远程巡飞弹作战的安全指挥与控制数据链》

《对抗性电磁环境下远程巡飞弹作战的安全指挥与控制数据链》

专知会员服务

7+阅读 · 7月27日

《北约下一代建模与仿真（NexGen M&S）计划》2026年69页

《北约下一代建模与仿真（NexGen M&S）计划》2026年69页

专知会员服务

5+阅读 · 7月27日

《防空交战流程的概率建模研究》

《防空交战流程的概率建模研究》

专知会员服务

11+阅读 · 7月27日

ICML 2026 教程 | 数值优化理论还重要吗？

ICML 2026 教程 | 数值优化理论还重要吗？

专知会员服务

7+阅读 · 7月26日

ICM 2026 | 陶哲轩：人工智能时代的数学

ICM 2026 | 陶哲轩：人工智能时代的数学

专知会员服务

10+阅读 · 7月26日

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【开放书】卡耐基梅隆大学Elaine Shi 教授《Foundations of Distributed Consensus and Blockchains（分布式共识和区块链的基础）》150页pdf

【开放书】卡耐基梅隆大学Elaine Shi 教授《Foundations of Distributed Consensus and Blockchains（分布式共识和区块链的基础）》150页pdf

专知会员服务

30+阅读 · 2022年2月22日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

80+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

106+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

博士论文 | 从算法到基础模型：强化学习的统一视角

《异构人类团队的协作决策过程混合建模研究》

论文解读 | 医学图像修复中的扩散模型：挑战、分类与未来方向

面向国防作战的最佳自主与蜂群无人机技术

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

On lower bounds for the bias-variance trade-off

Arxiv

0+阅读 · 2023年3月20日

On the Maximal Independent Sets of $k$-mers with the Edit Distance

Arxiv

0+阅读 · 2023年3月20日

Lower Generalization Bounds for GD and SGD in Smooth Stochastic Convex Optimization

Arxiv

0+阅读 · 2023年3月19日

Mean-square convergence rates of implicit Milstein type methods for SDEs with non-Lipschitz coefficients

Arxiv

0+阅读 · 2023年3月19日

Weak convergence of the backward Euler method for stochastic Cahn--Hilliard equation with additive noise

Arxiv

0+阅读 · 2023年3月19日

Push--Pull with Device Sampling

Arxiv

0+阅读 · 2023年3月17日

User Selection for Simple Passive Beamforming in Multi-RIS-Aided Multi-User Communications

Arxiv

0+阅读 · 2023年3月16日

Well-classified Examples are Underestimated in Classification with Deep Neural Networks

Arxiv

0+阅读 · 2023年3月16日

Noisy Low-rank Matrix Optimization: Geometry of Local Minima and Convergence Rate

Arxiv

0+阅读 · 2023年3月15日

Trustworthy AI: From Principles to Practices

Arxiv

46+阅读 · 2021年10月4日

相关基金

Cab45S和RCN1调控细胞增殖和凋亡的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

各向异性非球形粒子对厄米高斯波束的散射及扭矩研究

国家自然科学基金

0+阅读 · 2013年12月31日

微纳结构铌酸锂波导高速电光调制器

国家自然科学基金

0+阅读 · 2013年12月31日

卤化物—二氧化硅纳米复合结构双功能薄膜的结构调控和发光特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

关于AI-半环簇与 Conway半环簇的研究

国家自然科学基金

1+阅读 · 2012年12月31日

组蛋白去乙酰化酶抑制剂对骨关节炎中Notch-NFAT信号通路调控的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Pt/(CdS－CdSe)共修饰TiO2纳米管阵列薄膜的制备与光催化性能研究

国家自然科学基金

0+阅读 · 2011年12月31日

活性氧和一氧化氮介导的胎盘内质网应激在LPS致畸中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

瘢痕疙瘩中TIEG1对Smad7转录调控的研究

国家自然科学基金

0+阅读 · 2009年12月31日

钙钛矿结构钒基氧化物光诱导效应研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员