A well-studied generalization of the standard online convex optimization (OCO) is constrained online convex optimization (COCO). In COCO, on every round, a convex cost function and a convex constraint function are revealed to the learner after the action for that round is chosen. The objective is to design an online policy that simultaneously achieves a small regret while ensuring small cumulative constraint violation (CCV) against an adaptive adversary. A long-standing open question in COCO is whether an online policy can simultaneously achieve $O(\sqrt{T})$ regret and $O(\sqrt{T})$ CCV without any restrictive assumptions. For the first time, we answer this in the affirmative and show that an online policy can simultaneously achieve $O(\sqrt{T})$ regret and $\tilde{O}(\sqrt{T})$ CCV. We establish this result by effectively combining the adaptive regret bound of the AdaGrad algorithm with Lyapunov optimization - a classic tool from control theory. Surprisingly, the analysis is short and elegant.
翻译:在线凸优化(OCO)的一个被广泛研究的推广形式是带约束的在线凸优化(COCO)。在COCO中,每一轮,学习者在选择该轮动作后将会观察到凸代价函数和凸约束函数。目标是设计一种在线策略,使其在应对自适应对抗时,既能实现较小的遗憾值,又能保证较小的累积约束违反(CCV)。COCO中一个长期存在的开放问题是:是否存在一种在线策略能在没有任何限制性假设的情况下同时达到$O(\sqrt{T})$的遗憾值和$O(\sqrt{T})$的CCV。我们首次对此给出了肯定回答,并证明了一种在线策略可以同时实现$O(\sqrt{T})$的遗憾值和$\tilde{O}(\sqrt{T})$的CCV。这一结果是通过将AdaGrad算法的自适应遗憾界与控制理论中的经典工具——李雅普诺夫优化——有效结合而建立的。令人惊讶的是,该分析方法简洁而优雅。