A well-studied generalization of the standard online convex optimization (OCO) is constrained online convex optimization (COCO). In COCO, on every round, a convex cost function and a convex constraint function are revealed to the learner after the action for that round is chosen. The objective is to design an online policy that simultaneously achieves a small regret while ensuring a small cumulative constraint violation (CCV) against an adaptive adversary interacting over a horizon of length $T$. A long-standing open question in COCO is whether an online policy can simultaneously achieve $O(\sqrt{T})$ regret and $O(\sqrt{T})$ CCV without any restrictive assumptions. For the first time, we answer this in the affirmative and show that an online policy can simultaneously achieve $O(\sqrt{T})$ regret and $\tilde{O}(\sqrt{T})$ CCV. Furthermore, in the case of strongly convex cost and convex constraint functions, the regret guarantee can be improved to $O(\log T)$ while keeping the CCV bound the same as above. We establish these results by effectively combining the adaptive regret bound of the AdaGrad algorithm with Lyapunov optimization - a classic tool from control theory. Surprisingly, the analysis is short and elegant.
翻译:标准在线凸优化(OCO)的一个被深入研究的推广是约束在线凸优化(COCO)。在COCO中,每一轮,在选定该轮的动作后,一个凸成本函数和一个凸约束函数会向学习者揭示。目标是设计一种在线策略,在长度为$T$的时间范围内与自适应对手交互时,同时实现较小的遗憾并确保较小的累积约束违反(CCV)。COCO中长期存在的一个悬而未决的问题是:在不依赖任何限制性假设的情况下,在线策略能否同时实现$O(\sqrt{T})$遗憾和$O(\sqrt{T})$ CCV。我们首次对此给出了肯定答案,并证明在线策略可以同时实现$O(\sqrt{T})$遗憾和$\tilde{O}(\sqrt{T})$ CCV。此外,在成本函数为强凸且约束函数为凸的情况下,遗憾保证可以改进为$O(\log T)$,同时保持CCV界与上述相同。我们通过将AdaGrad算法的自适应遗憾界与李雅普诺夫优化(控制论中的经典工具)有效结合来建立这些结果。令人惊讶的是,分析过程简短而优雅。