We design learning rate schedules that minimize regret for SGD-based online learning in the presence of a changing data distribution. We fully characterize the optimal learning rate schedule for online linear regression via a novel analysis with stochastic differential equations. For general convex loss functions, we propose new learning rate schedules that are robust to distribution shift and we give upper and lower bounds for the regret that only differ by constants. For non-convex loss functions, we define a notion of regret based on the gradient norm of the estimated models and propose a learning schedule that minimizes an upper bound on the total expected regret. Intuitively, one expects changing loss landscapes to require more exploration, and we confirm that optimal learning rate schedules typically increase in the presence of distribution shift. Finally, we provide experiments for high-dimensional regression models and neural networks to illustrate these learning rate schedules and their cumulative regret.
翻译:我们针对随机梯度下降(SGD)在线学习中数据分布动态变化的情景,设计了最小化累计遗憾的学习率调度策略。通过引入随机微分方程的新型分析方法,我们完整刻画了在线线性回归问题的最优学习率调度方案。对于一般凸损失函数,我们提出对分布偏移具有鲁棒性的新学习率调度策略,并给出了常数级紧致的遗憾上界与下界。在非凸损失函数情形下,我们基于估计模型梯度范数定义了新的遗憾度量指标,并提出能最小化期望总遗憾上界的学习调度方案。直观而言,损失景观的动态变化需要增强探索行为,我们证实最优学习率调度在存在分布偏移时通常会提升。最后,我们通过高维回归模型与神经网络的实验,展示了这些学习率调度策略及其累计遗憾表现。