Lower Generalization Bounds for GD and SGD in Smooth Stochastic Convex Optimization - 专知论文

会员服务 ·

0

泛化理论 · 通用动力公司 · 优化器 · SGD · 平滑 ·

2023 年 3 月 19 日

Lower Generalization Bounds for GD and SGD in Smooth Stochastic Convex Optimization

翻译：平滑随机凸优化中GD与SGD的泛化下界

Peiyuan Zhang,Jiaye Teng,Jingzhao Zhang

from arxiv, 30 pages

Recent progress was made in characterizing the generalization error of gradient methods for general convex loss by the learning theory community. In this work, we focus on how training longer might affect generalization in smooth stochastic convex optimization (SCO) problems. We first provide tight lower bounds for general non-realizable SCO problems. Furthermore, existing upper bound results suggest that sample complexity can be improved by assuming the loss is realizable, i.e. an optimal solution simultaneously minimizes all the data points. However, this improvement is compromised when training time is long and lower bounds are lacking. Our paper examines this observation by providing excess risk lower bounds for gradient descent (GD) and stochastic gradient descent (SGD) in two realizable settings: 1) realizable with $T = O(n)$, and (2) realizable with $T = \Omega(n)$, where $T$ denotes the number of training iterations and $n$ is the size of the training dataset. These bounds are novel and informative in characterizing the relationship between $T$ and $n$. In the first small training horizon case, our lower bounds almost tightly match and provide the first optimal certificates for the corresponding upper bounds. However, for the realizable case with $T = \Omega(n)$, a gap exists between the lower and upper bounds. We provide a conjecture to address this problem, that the gap can be closed by improving upper bounds, which is supported by our analyses in one-dimensional and linear regression scenarios.

翻译：近期，学习理论界在刻画梯度方法对一般凸损失的泛化误差方面取得了进展。本文聚焦于训练时长如何影响平滑随机凸优化（SCO）问题的泛化性能。首先，针对一般不可实现SCO问题，我们给出了紧致的下界。此外，现有上界结果表明，通过假设损失函数是“可实现”的（即存在最优解同时最小化所有数据点），可改善样本复杂度。然而，当训练时间较长且缺乏下界时，这种改善会受到影响。本文通过为两种可实现场景下的梯度下降（GD）和随机梯度下降（SGD）提供超额风险下界来探究这一现象：1）可实现且训练迭代次数$T = O(n)$，以及（2）可实现且$T = \Omega(n)$，其中$T$表示训练迭代次数，$n$表示训练数据集规模。这些下界在刻画$T$与$n$的关系方面具有创新性和启发性。在第一种小训练时长场景中，我们的下界几乎严格匹配相应上界，并首次提供了最优验证。然而，在$T = \Omega(n)$的可实现场景中，下界与上界之间存在缺口。针对此问题，我们提出一个猜想：该缺口可通过改进上界来弥合，这一猜想在一维和线性回归场景的分析中得到了支持。

0

相关内容

泛化理论

【港科大Yunfei Yang博士论文】生成式对抗网络的分布学习:近似与泛化

【港科大Yunfei Yang博士论文】生成式对抗网络的分布学习:近似与泛化

专知会员服务

34+阅读 · 2022年5月29日

南大《优化方法（Optimization Methods》课程，推荐！

南大《优化方法（Optimization Methods》课程，推荐！

专知会员服务

80+阅读 · 2022年4月3日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

52+阅读 · 2020年12月14日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

【MIT】图神经网络的泛化与表示极限，《Generalization and Representational Limits of Graph Neural Networks》

【MIT】图神经网络的泛化与表示极限，《Generalization and Representational Limits of Graph Neural Networks》

专知会员服务

46+阅读 · 2020年2月23日

【NeurIPS 2019|经典论文奖】正则随机学习和在线优化的双重平均法（Dual Averaging Method for Regularized Stochastic Learning and Online Optimization），微软研究院Lin Xiao

【NeurIPS 2019|经典论文奖】正则随机学习和在线优化的双重平均法（Dual Averaging Method for Regularized Stochastic Learning and Online Optimization），微软研究院Lin Xiao

专知会员服务

17+阅读 · 2019年12月9日

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

专知会员服务

18+阅读 · 2019年11月1日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

COLING 2022 | Pro-KD：循序渐进的平滑知识蒸馏

COLING 2022 | Pro-KD：循序渐进的平滑知识蒸馏

PaperWeekly

2+阅读 · 2022年10月5日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

提高GAN训练稳定性的9大tricks

提高GAN训练稳定性的9大tricks

人工智能前沿讲习班

13+阅读 · 2019年3月19日

【泡泡一分钟】DS-SLAM: 动态环境下的语义视觉SLAM

【泡泡一分钟】DS-SLAM: 动态环境下的语义视觉SLAM

泡泡机器人SLAM

23+阅读 · 2019年1月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

随机多尺度系统的亚稳态理论

国家自然科学基金

0+阅读 · 2015年12月31日

随机对策的首达目标准则及其有限逼近

国家自然科学基金

0+阅读 · 2015年12月31日

指数和在编码和密码学中的一些应用

国家自然科学基金

0+阅读 · 2013年12月31日

几类Pfaffian图的结构性质研究

国家自然科学基金

1+阅读 · 2013年12月31日

神经网络随机学习算法的泛化性研究

国家自然科学基金

2+阅读 · 2013年12月31日

锆钛合金掺杂轻质元素的理论研究

国家自然科学基金

0+阅读 · 2012年12月31日

资产数目与投资周期带有基数约束的投资组合优化

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

有限维Banach几何与关于凸体覆盖的Hadwiger猜想

国家自然科学基金

0+阅读 · 2012年12月31日

低强度650 nm GaInP/AlGaInP半导体激光促进中性粒细胞胞外杀菌网形成的机制研究

国家自然科学基金

0+阅读 · 2008年12月31日

Convergence of Alternating Gradient Descent for Matrix Factorization

Arxiv

0+阅读 · 2023年5月11日

Lower Generalization Bounds for GD and SGD in Smooth Stochastic Convex Optimization

Arxiv

0+阅读 · 2023年5月10日

Optimally-Weighted Estimators of the Maximum Mean Discrepancy for Likelihood-Free Inference

Arxiv

0+阅读 · 2023年5月10日

Convergence of a Normal Map-based Prox-SGD Method under the KL Inequality

Arxiv

0+阅读 · 2023年5月10日

Optimizing Privacy, Utility and Efficiency in Constrained Multi-Objective Federated Learning

Arxiv

0+阅读 · 2023年5月9日

A policy gradient approach for optimization of smooth risk measures

Arxiv

0+阅读 · 2023年5月9日

Towards Understanding Generalization of Macro-AUC in Multi-label Learning

Arxiv

0+阅读 · 2023年5月9日

Accelerated gradient descent method for functionals of probability measures by new convexity and smoothness based on transport maps

Arxiv

0+阅读 · 2023年5月9日

An order-theoretic perspective on modes and maximum a posteriori estimation in Bayesian inverse problems

Arxiv

0+阅读 · 2023年5月8日

A survey and taxonomy of loss functions in machine learning

Arxiv

28+阅读 · 2023年1月13日

VIP会员

文章信息

相关主题

通用动力公司

最新内容

美国马六甲“三重网”概念：安全网、威慑网与杀伤网

美国马六甲“三重网”概念：安全网、威慑网与杀伤网

专知会员服务

1+阅读 · 今天8:18

《面向导弹有效发射时机的监督机器学习方法：基于超视距空战仿真》

《面向导弹有效发射时机的监督机器学习方法：基于超视距空战仿真》

专知会员服务

1+阅读 · 今天7:39

《通用大语言模型：无人机指挥与控制接口》最新40页

《通用大语言模型：无人机指挥与控制接口》最新40页

专知会员服务

5+阅读 · 今天7:33

《通过小型无人机系统将情报能力“作战化”》

《通过小型无人机系统将情报能力“作战化”》

专知会员服务

1+阅读 · 今天7:28

《神经安全型有人–无人协同：面向认知自适应作战能力的参考架构》

《神经安全型有人–无人协同：面向认知自适应作战能力的参考架构》

专知会员服务

2+阅读 · 今天7:14

《在指挥链中通过多准则决策分析传达指挥官意图：空战实验》

《在指挥链中通过多准则决策分析传达指挥官意图：空战实验》

专知会员服务

17+阅读 · 6月15日

消耗优势：美军的“精确规模化”概念

消耗优势：美军的“精确规模化”概念

专知会员服务

7+阅读 · 6月15日

五角大楼的AI优先战略及其对现代战争的启示：来自与伊朗冲突的经验教训

五角大楼的AI优先战略及其对现代战争的启示：来自与伊朗冲突的经验教训

专知会员服务

8+阅读 · 6月15日

《网络空间兵棋推演：挑战、局限性与混合路径》报告

《网络空间兵棋推演：挑战、局限性与混合路径》报告

专知会员服务

8+阅读 · 6月15日

《离线语言支持系统：面向空战战术决策》

《离线语言支持系统：面向空战战术决策》

专知会员服务

8+阅读 · 6月15日

《以通信为中心的6G–LLM架构：面向可扩展的战术自主防御车辆网络》

《以通信为中心的6G–LLM架构：面向可扩展的战术自主防御车辆网络》

专知会员服务

6+阅读 · 6月15日

ICML 2026｜ECA：面向开放式图文生成的高效持续对齐

ICML 2026｜ECA：面向开放式图文生成的高效持续对齐

专知会员服务

6+阅读 · 6月14日

可信智能体AI综述：安全、鲁棒性、隐私与系统安全

可信智能体AI综述：安全、鲁棒性、隐私与系统安全

专知会员服务

6+阅读 · 6月14日

俄乌战场地面机器人如何改写战争规则

俄乌战场地面机器人如何改写战争规则

专知会员服务

9+阅读 · 6月14日

美国海军研究生院第23届年度采购研究研讨会与创新峰会：主题“加速作战能力”，附会议报告论文集1300页

美国海军研究生院第23届年度采购研究研讨会与创新峰会：主题“加速作战能力”，附会议报告论文集1300页

专知会员服务

13+阅读 · 6月14日

相关VIP内容

【港科大Yunfei Yang博士论文】生成式对抗网络的分布学习:近似与泛化

【港科大Yunfei Yang博士论文】生成式对抗网络的分布学习:近似与泛化

专知会员服务

34+阅读 · 2022年5月29日

南大《优化方法（Optimization Methods》课程，推荐！

南大《优化方法（Optimization Methods》课程，推荐！

专知会员服务

80+阅读 · 2022年4月3日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

52+阅读 · 2020年12月14日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

【MIT】图神经网络的泛化与表示极限，《Generalization and Representational Limits of Graph Neural Networks》

【MIT】图神经网络的泛化与表示极限，《Generalization and Representational Limits of Graph Neural Networks》

专知会员服务

46+阅读 · 2020年2月23日

【NeurIPS 2019|经典论文奖】正则随机学习和在线优化的双重平均法（Dual Averaging Method for Regularized Stochastic Learning and Online Optimization），微软研究院Lin Xiao

【NeurIPS 2019|经典论文奖】正则随机学习和在线优化的双重平均法（Dual Averaging Method for Regularized Stochastic Learning and Online Optimization），微软研究院Lin Xiao

专知会员服务

17+阅读 · 2019年12月9日

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

专知会员服务

18+阅读 · 2019年11月1日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《面向导弹有效发射时机的监督机器学习方法：基于超视距空战仿真》

《通过小型无人机系统将情报能力“作战化”》

美国马六甲“三重网”概念：安全网、威慑网与杀伤网

《通用大语言模型：无人机指挥与控制接口》最新40页

相关资讯

COLING 2022 | Pro-KD：循序渐进的平滑知识蒸馏

COLING 2022 | Pro-KD：循序渐进的平滑知识蒸馏

PaperWeekly

2+阅读 · 2022年10月5日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

提高GAN训练稳定性的9大tricks

提高GAN训练稳定性的9大tricks

人工智能前沿讲习班

13+阅读 · 2019年3月19日

【泡泡一分钟】DS-SLAM: 动态环境下的语义视觉SLAM

【泡泡一分钟】DS-SLAM: 动态环境下的语义视觉SLAM

泡泡机器人SLAM

23+阅读 · 2019年1月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Convergence of Alternating Gradient Descent for Matrix Factorization

Arxiv

0+阅读 · 2023年5月11日

Lower Generalization Bounds for GD and SGD in Smooth Stochastic Convex Optimization

Arxiv

0+阅读 · 2023年5月10日

Optimally-Weighted Estimators of the Maximum Mean Discrepancy for Likelihood-Free Inference

Arxiv

0+阅读 · 2023年5月10日

Convergence of a Normal Map-based Prox-SGD Method under the KL Inequality

Arxiv

0+阅读 · 2023年5月10日

Optimizing Privacy, Utility and Efficiency in Constrained Multi-Objective Federated Learning

Arxiv

0+阅读 · 2023年5月9日

A policy gradient approach for optimization of smooth risk measures

Arxiv

0+阅读 · 2023年5月9日

Towards Understanding Generalization of Macro-AUC in Multi-label Learning

Arxiv

0+阅读 · 2023年5月9日

Accelerated gradient descent method for functionals of probability measures by new convexity and smoothness based on transport maps

Arxiv

0+阅读 · 2023年5月9日

An order-theoretic perspective on modes and maximum a posteriori estimation in Bayesian inverse problems

Arxiv

0+阅读 · 2023年5月8日

A survey and taxonomy of loss functions in machine learning

Arxiv

28+阅读 · 2023年1月13日

相关基金

随机多尺度系统的亚稳态理论

国家自然科学基金

0+阅读 · 2015年12月31日

随机对策的首达目标准则及其有限逼近

国家自然科学基金

0+阅读 · 2015年12月31日

指数和在编码和密码学中的一些应用

国家自然科学基金

0+阅读 · 2013年12月31日

几类Pfaffian图的结构性质研究

国家自然科学基金

1+阅读 · 2013年12月31日

神经网络随机学习算法的泛化性研究

国家自然科学基金

2+阅读 · 2013年12月31日

锆钛合金掺杂轻质元素的理论研究

国家自然科学基金

0+阅读 · 2012年12月31日

资产数目与投资周期带有基数约束的投资组合优化

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

有限维Banach几何与关于凸体覆盖的Hadwiger猜想

国家自然科学基金

0+阅读 · 2012年12月31日

低强度650 nm GaInP/AlGaInP半导体激光促进中性粒细胞胞外杀菌网形成的机制研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员