Generalization Error of First-Order Methods for Statistical Learning with Generic Oracles

In this paper, we provide a novel framework for the analysis of generalization error of first-order optimization algorithms for statistical learning when the gradient can only be accessed through partial observations given by an oracle. Our analysis relies on the regularity of the gradient w.r.t. the data samples, and allows to derive near matching upper and lower bounds for the generalization error of multiple learning problems, including supervised learning, transfer learning, robust learning, distributed learning and communication efficient learning using gradient quantization. These results hold for smooth and strongly-convex optimization problems, as well as smooth non-convex optimization problems verifying a Polyak-Lojasiewicz assumption. In particular, our upper and lower bounds depend on a novel quantity that extends the notion of conditional standard deviation, and is a measure of the extent to which the gradient can be approximated by having access to the oracle. As a consequence, our analysis provides a precise meaning to the intuition that optimization of the statistical learning objective is as hard as the estimation of its gradient. Finally, we show that, in the case of standard supervised learning, mini-batch gradient descent with increasing batch sizes and a warm start can reach a generalization error that is optimal up to a multiplicative factor, thus motivating the use of this optimization scheme in practical applications.

翻译：本文提出了一种新颖的分析框架，用于研究当梯度仅能通过预测器给出的部分观测值访问时，统计学习问题中一阶优化算法的泛化误差。我们的分析依赖于梯度相对于数据样本的正则性，并能够为多个学习问题（包括监督学习、迁移学习、鲁棒学习、分布式学习以及使用梯度量化的通信高效学习）导出近乎匹配的泛化误差上界和下界。这些结果适用于光滑且强凸的优化问题，以及满足Polyak-Lojasiewicz假设的光滑非凸优化问题。特别地，我们的上界和下界取决于一个扩展条件标准差概念的新颖量，该量衡量了通过访问预测器来近似梯度的程度。因此，我们的分析为以下直觉提供了精确含义：统计学习目标的优化与其梯度估计同样困难。最后，我们证明，在标准监督学习情形下，采用递增批量大小和热启动的小批量梯度下降法能够达到最优泛化误差（仅相差一个乘法因子），从而为该优化方案在实际应用中的使用提供了理论依据。

相关内容

泛化误差

关注 107

学习方法的泛化能力（Generalization Error）是由该方法学习到的模型对未知数据的预测能力，是学习方法本质上重要的性质。现实中采用最多的办法是通过测试泛化误差来评价学习方法的泛化能力。泛化误差界刻画了学习算法的经验风险与期望风险之间偏差和收敛速度。一个机器学习的泛化误差（Generalization Error），是一个描述学生机器在从样品数据中学习之后，离教师机器之间的差距的函数。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日