Online convex optimization (OCO) is a widely used framework in online learning. In each round, the learner chooses a decision in a convex set and an adversary chooses a convex loss function, and then the learner suffers the loss associated with their current decision. However, in many applications the learner's loss depends not only on the current decision but on the entire history of decisions until that point. The OCO framework and its existing generalizations do not capture this, and they can only be applied to many settings of interest after a long series of approximation arguments. They also leave open the question of whether the dependence on memory is tight because there are no non-trivial lower bounds. In this work we introduce a generalization of the OCO framework, ``Online Convex Optimization with Unbounded Memory'', that captures long-term dependence on past decisions. We introduce the notion of $p$-effective memory capacity, $H_p$, that quantifies the maximum influence of past decisions on present losses. We prove an $O(\sqrt{H_p T})$ upper bound on the policy regret and a matching (worst-case) lower bound. As a special case, we prove the first non-trivial lower bound for OCO with finite memory~\citep{anavaHM2015online}, which could be of independent interest, and also improve existing upper bounds. We demonstrate the broad applicability of our framework by using it to derive regret bounds, and to improve and simplify existing regret bound derivations, for a variety of online learning problems including online linear control and an online variant of performative prediction.
翻译:在线凸优化(OCO)是在线学习中广泛使用的框架。每一轮中,学习者在凸集中选择一个决策,对手选择一个凸损失函数,随后学习者承担当前决策对应的损失。然而,在许多应用中,学习者的损失不仅取决于当前决策,还依赖于直至该时刻的整个决策历史。现有OCO框架及其推广形式未能捕捉这一特性,且仅能通过一系列近似论证应用于许多感兴趣的设定。同时,由于缺乏非平凡的下界,记忆依赖性的紧致性问题尚未解决。本文提出OCO框架的推广形式——“在线凸优化与无界记忆”,该框架能够刻画过去决策的长期影响。我们引入概念$p$-有效记忆容量$H_p$,用于量化过去决策对当前损失的最大影响。我们证明策略遗憾的上界为$O(\sqrt{H_p T})$,并给出匹配的最坏情况下界。作为特例,我们证明了有限记忆OCO的首个非平凡下界~\citep{anavaHM2015online},该结果可能具有独立研究价值,同时改进了现有上界。通过将本框架应用于多种在线学习问题,包括在线线性控制和在线变体性能预测,我们推导出遗憾界,并改进和简化了现有遗憾界推导,从而展示了框架的广泛适用性。