Stochastic and adversarial data are two widely studied settings in online learning. But many optimization tasks are neither i.i.d. nor fully adversarial, which makes it of fundamental interest to get a better theoretical understanding of the world between these extremes. In this work we establish novel regret bounds for online convex optimization in a setting that interpolates between stochastic i.i.d. and fully adversarial losses. By exploiting smoothness of the expected losses, these bounds replace a dependence on the maximum gradient length by the variance of the gradients, which was previously known only for linear losses. In addition, they weaken the i.i.d. assumption by allowing, for example, adversarially poisoned rounds, which were previously considered in the related expert and bandit settings. In the fully i.i.d. case, our regret bounds match the rates one would expect from results in stochastic acceleration, and we also recover the optimal stochastically accelerated rates via online-to-batch conversion. In the fully adversarial case our bounds gracefully deteriorate to match the minimax regret. We further provide lower bounds showing that our regret upper bounds are tight for all intermediate regimes in terms of the stochastic variance and the adversarial variation of the loss gradients.
翻译:随机数据和对抗数据是在线学习中广泛研究的两种设定。然而,许多优化任务既非独立同分布,也非完全对抗,这使得从理论上更好地理解这两个极端之间的中间地带具有基础性意义。本文针对插值于随机独立同分布损失与完全对抗损失之间的设定,建立了在线凸优化的新遗憾界。通过利用期望损失的平滑性,这些界将原本仅针对线性损失已知的、依赖最大梯度长度的项替换为梯度的方差。此外,它们还弱化了独立同分布假设,例如允许对抗性毒化的轮次,这类轮次此前仅在相关的专家设定和赌博机设定中被考虑。在完全独立同分布情形下,我们的遗憾界与随机加速结果中预期的速率一致,并且通过在线到批量的转化,我们还能恢复最优的随机加速速率。在完全对抗情形下,我们的界优雅地退化至极小极大遗憾。我们进一步给出了下界,表明在损失梯度的随机方差和对抗变化的所有中间区间上,我们的遗憾上界都是紧的。