In this paper, we study dynamic regret in unconstrained online convex optimization (OCO) with movement costs. Specifically, we generalize the standard setting by allowing the movement cost coefficients $λ_t$ to vary arbitrarily over time. Our main contribution is a novel algorithm that establishes the first comparator-adaptive dynamic regret bound for this setting, guaranteeing $\widetilde{\mathcal{O}}(\sqrt{(1+P_T)(T+\sum_t λ_t)})$ regret, where $P_T$ is the path length of the comparator sequence over $T$ rounds. This recovers the optimal guarantees for both static and dynamic regret in standard OCO as a special case where $λ_t=0$ for all rounds. To demonstrate the versatility of our results, we consider two applications: OCO with delayed feedback and OCO with time-varying memory. We show that both problems can be translated into time-varying movement costs, establishing a novel reduction specifically for the delayed feedback setting that is of independent interest. A crucial observation is that the first-order dependence on movement costs in our regret bound plays a key role in enabling optimal comparator-adaptive dynamic regret guarantees in both settings.
翻译:本文研究了具有移动成本的无约束在线凸优化(OCO)中的动态遗憾问题。具体而言,我们通过允许移动成本系数 $λ_t$ 随时间任意变化,推广了标准设定。我们的主要贡献是提出了一种新颖算法,为该设定建立了首个比较器自适应的动态遗憾界,保证遗憾为 $\widetilde{\mathcal{O}}(\sqrt{(1+P_T)(T+\sum_t λ_t)})$,其中 $P_T$ 是 $T$ 轮中比较器序列的路径长度。当所有轮次的 $λ_t=0$ 时,该结果作为特例恢复了标准 OCO 中静态与动态遗憾的最优保证。为展示我们结果的普适性,我们考虑了两个应用场景:具有延迟反馈的 OCO 和具有时变记忆的 OCO。我们证明这两个问题均可转化为时变移动成本问题,并针对延迟反馈设定建立了一种具有独立意义的新颖归约方法。一个关键观察是,我们遗憾界中对移动成本的一阶依赖在实现这两个设定中最优的比较器自适应动态遗憾保证方面起着核心作用。