We study Constrained Online Convex Optimization (COCO), where a learner chooses actions iteratively, observes both unanticipated convex loss and convex constraint, and accumulates loss while incurring penalties for constraint violations. We introduce CLASP (Convex Losses And Squared Penalties), an algorithm that minimizes cumulative loss together with squared constraint violations. Our analysis departs from prior work by fully leveraging the firm non-expansiveness of convex projectors, a proof strategy not previously applied in this setting. For convex losses, CLASP achieves regret $O\left(T^{\max\{β,1-β\}}\right)$ and cumulative squared penalty $O\left(T^{1-β}\right)$ for any $β\in (0,1)$. Most importantly, for strongly convex problems, CLASP provides the first logarithmic guarantees on both regret and cumulative squared penalty. In the strongly convex case, the regret is upper bounded by $O( \log T )$ and the cumulative squared penalty is also upper bounded by $O( \log T )$.
翻译:本文研究约束在线凸优化问题,其中学习者迭代选择行动,同时观测未预期的凸损失与凸约束,在累积损失的同时还需承担约束违反带来的惩罚。我们提出CLASP算法,该算法旨在最小化累积损失与约束违反的平方惩罚之和。我们的分析区别于先前工作,其核心在于充分利用凸投影算子的强非扩张性——这一证明策略此前未在该研究背景下应用。对于凸损失函数,CLASP对任意$β\in (0,1)$可实现$O\left(T^{\max\{β,1-β\}}\right)$的遗憾界与$O\left(T^{1-β}\right)$的累积平方惩罚界。最重要的是,对于强凸问题,CLASP首次在遗憾与累积平方惩罚两方面同时取得对数级保证。在强凸情形下,遗憾上界为$O( \log T )$,累积平方惩罚上界亦为$O( \log T )$。