In this work, we study online convex optimization with a fixed constraint function $g : \mathbb{R}^d \rightarrow \mathbb{R}$. Prior work on this problem has shown $O(\sqrt{T})$ regret and cumulative constraint satisfaction $\sum_{t=1}^{T} g(x_t) \leq 0$, while only accessing the constraint value and subgradient at the played actions $g(x_t), \partial g(x_t)$. Using the same constraint information, we show a stronger guarantee of anytime constraint satisfaction $g(x_t) \leq 0 \ \forall t \in [T]$, and matching $O(\sqrt{T})$ regret guarantees. These contributions are thanks to our approach of using Polyak feasibility steps to ensure constraint satisfaction, without sacrificing regret. Specifically, after each step of online gradient descent, our algorithm applies a subgradient descent step on the constraint function where the step-size is chosen according to the celebrated Polyak step-size. We further validate this approach with numerical experiments.
翻译:本文研究具有固定约束函数$g : \mathbb{R}^d \rightarrow \mathbb{R}$的在线凸优化问题。该问题的现有工作已证明在仅访问已执行动作处的约束值及次梯度$g(x_t), \partial g(x_t)$的情况下,可实现$O(\sqrt{T})$的遗憾值与累积约束满足$\sum_{t=1}^{T} g(x_t) \leq 0$。利用相同的约束信息,本文证明了更强的任意时刻约束满足性保证$g(x_t) \leq 0 \ \forall t \in [T]$,同时保持匹配的$O(\sqrt{T})$遗憾界。这些贡献得益于我们采用Polyak可行性步来确保约束满足且不牺牲遗憾值的方法。具体而言,在每步在线梯度下降后,本算法对约束函数执行次梯度下降步,其步长根据著名的Polyak步长准则选取。我们通过数值实验进一步验证了该方法的有效性。