We design learning rate schedules that minimize regret for SGD-based online learning in the presence of a changing data distribution. We fully characterize the optimal learning rate schedule for online linear regression via a novel analysis with stochastic differential equations. For general convex loss functions, we propose new learning rate schedules that are robust to distribution shift, and we give upper and lower bounds for the regret that only differ by constants. For non-convex loss functions, we define a notion of regret based on the gradient norm of the estimated models and propose a learning schedule that minimizes an upper bound on the total expected regret. Intuitively, one expects changing loss landscapes to require more exploration, and we confirm that optimal learning rate schedules typically increase in the presence of distribution shift. Finally, we provide experiments for high-dimensional regression models and neural networks to illustrate these learning rate schedules and their cumulative regret.
翻译:我们设计了在数据分布变化场景下最小化SGD在线学习遗憾值的学习率调度方案。通过随机微分方程的创新分析,完整刻画了在线线性回归的最优学习率调度策略。针对一般凸损失函数,提出了对分布漂移具有鲁棒性的新型学习率调度方案,并给出了仅相差常数的遗憾值上下界。对于非凸损失函数,基于估计模型的梯度范数定义了新的遗憾值概念,提出了能最小化总期望遗憾值上界的学习率调度策略。直观而言,损失函数景观的动态变化需要更充分的探索,我们的研究证实了在分布漂移情况下最优学习率调度通常呈现递增趋势。最后,通过高维回归模型与神经网络的实验,验证了这些学习率调度策略及其累计遗憾值性能。