We investigate online convex optimization in non-stationary environments and choose the dynamic regret as the performance measure, defined as the difference between cumulative loss incurred by the online algorithm and that of any feasible comparator sequence. Let $T$ be the time horizon and $P_T$ be the path length that essentially reflects the non-stationarity of environments, the state-of-the-art dynamic regret is $\mathcal{O}(\sqrt{T(1+P_T)})$. Although this bound is proved to be minimax optimal for convex functions, in this paper, we demonstrate that it is possible to further enhance the guarantee for some easy problem instances, particularly when online functions are smooth. Specifically, we introduce novel online algorithms that can exploit smoothness and replace the dependence on $T$ in dynamic regret with problem-dependent quantities: the variation in gradients of loss functions, the cumulative loss of the comparator sequence, and the minimum of these two terms. These quantities are at most $\mathcal{O}(T)$ while could be much smaller in benign environments. Therefore, our results are adaptive to the intrinsic difficulty of the problem, since the bounds are tighter than existing results for easy problems and meanwhile guarantee the same rate in the worst case. Notably, our proposed algorithms can achieve favorable dynamic regret with only one gradient per iteration, sharing the same gradient query complexity as the static regret minimization methods. To accomplish this, we introduce the framework of collaborative online ensemble. The proposed framework employs a two-layer online ensemble to handle non-stationarity, and uses optimistic online learning and further introduces crucial correction terms to facilitate effective collaboration within the meta-base two layers, thereby attaining adaptivity. We believe that the framework can be useful for broader problems.
翻译:摘要:我们研究非平稳环境下的在线凸优化问题,并以动态遗憾作为性能度量指标,该指标定义为在线算法与任意可行比较器序列产生的累积损失之差。设 $T$ 为时间范围,$P_T$ 为反映环境非平稳性的路径长度,当前最优的动态遗憾界为 $\mathcal{O}(\sqrt{T(1+P_T)})$。尽管该界已被证明对凸函数而言是极小化最优的,本文证明在特定简单问题实例中——特别是当在线函数光滑时——可以进一步改进该保障。具体而言,我们提出能够利用光滑性的新型在线算法,将动态遗憾中对 $T$ 的依赖替换为问题依赖量:损失函数梯度变化量、比较器序列的累积损失,以及这两者的最小值。这些量在最坏情况下为 $\mathcal{O}(T)$,但在良性环境中可能远小于此。因此,我们的结果能自适应问题的内在难度,因为在简单问题上界比现有结果更紧,同时在最坏情况下保持相同速率。值得注意的是,我们提出的算法每轮仅需一个梯度即可实现优越的动态遗憾,与静态遗憾最小化方法具有相同的梯度查询复杂度。为实现这一目标,我们引入协作式在线集成框架。该框架采用双层在线集成处理非平稳性,利用乐观在线学习并引入关键修正项促进元基两层间的有效协作,从而实现自适应性。我们相信该框架对更广泛的问题具有实用价值。