We consider a generalization of the celebrated Online Convex Optimization (OCO) framework with adversarial online constraints. In this problem, an online learner interacts with an adversary sequentially over multiple rounds. At the beginning of each round, the learner chooses an action from a convex decision set. After that, the adversary reveals a convex cost function and a convex constraint function. The goal of the learner is to minimize the cumulative cost while satisfying the constraints as tightly as possible. We present two efficient algorithms with simple modular structures that give universal dynamic regret and cumulative constraint violation bounds, improving upon state-of-the-art results. While the first algorithm, which achieves the optimal regret bound, involves projection onto the constraint sets, the second algorithm is projection-free and achieves better violation bounds in rapidly varying environments. Our results hold in the most general case when both the cost and constraint functions are chosen arbitrarily, and the constraint functions need not contain any fixed common feasible point. We establish these results by introducing a general framework that reduces the constrained learning problem to an instance of the standard OCO problem with specially constructed surrogate cost functions.
翻译:我们考虑对经典的在线凸优化框架进行推广,引入对抗性在线约束。在此问题中,在线学习者在多轮中顺序地与对手进行交互。在每一轮开始时,学习者从一个凸决策集中选择一个动作。随后,对手揭示一个凸成本函数和一个凸约束函数。学习者的目标是在尽可能严格满足约束的同时最小化累积成本。我们提出了两种具有简单模块化结构的高效算法,它们给出了通用的动态遗憾和累积约束违反界,改进了现有最优结果。第一种算法实现了最优遗憾界,但涉及对约束集的投影;第二种算法是无投影的,在快速变化的环境中能获得更好的违反界。我们的结果在最一般情况下成立:成本函数和约束函数均可任意选择,且约束函数无需包含任何固定的公共可行点。我们通过引入一个通用框架来建立这些结果,该框架将约束学习问题约化为一个具有特殊构造的代理成本函数的标准在线凸优化问题实例。