We consider the setting of online convex optimization with adversarial time-varying constraints in which actions must be feasible w.r.t. a fixed constraint set, and are also required on average to approximately satisfy additional time-varying constraints. Motivated by scenarios in which the fixed feasible set (hard constraint) is difficult to project on, we consider projection-free algorithms that access this set only through a linear optimization oracle (LOO). We present an algorithm that, on a sequence of length $T$ and using overall $T$ calls to the LOO, guarantees $\tilde{O}(T^{3/4})$ regret w.r.t. the losses and $O(T^{7/8})$ constraints violation (ignoring all quantities except for $T$) . In particular, these bounds hold w.r.t. any interval of the sequence. We also present a more efficient algorithm that requires only first-order oracle access to the soft constraints and achieves similar bounds w.r.t. the entire sequence. We extend the latter to the setting of bandit feedback and obtain similar bounds (as a function of $T$) in expectation.
翻译:我们考虑具有对抗性时变约束的在线凸优化场景,其中动作必须相对于固定约束集可行,并且平均而言还需要近似满足额外的时变约束。受固定可行集(硬约束)难以投影的场景启发,我们考虑了仅通过线性优化预言机(LOO)访问该集合的无投影算法。我们提出一种算法,在长度为 $T$ 的序列上,通过总计 $T$ 次对 LOO 的调用,保证相对于损失具有 $\tilde{O}(T^{3/4})$ 的遗憾界和 $O(T^{7/8})$ 的约束违反度(忽略除 $T$ 外的所有量)。特别地,这些界对于序列的任意区间都成立。我们还提出一种更高效的算法,该算法仅需对软约束进行一阶预言机访问,并在整个序列上实现类似的界。我们将后者扩展到赌博反馈场景,并在期望意义上获得类似的界(作为 $T$ 的函数)。