Projection operations are a typical computation bottleneck in online learning. In this paper, we enable projection-free online learning within the framework of Online Convex Optimization with Memory (OCO-M) -- OCO-M captures how the history of decisions affects the current outcome by allowing the online learning loss functions to depend on both current and past decisions. Particularly, we introduce the first projection-free meta-base learning algorithm with memory that minimizes dynamic regret, i.e., that minimizes the suboptimality against any sequence of time-varying decisions. We are motivated by artificial intelligence applications where autonomous agents need to adapt to time-varying environments in real-time, accounting for how past decisions affect the present. Examples of such applications are: online control of dynamical systems; statistical arbitrage; and time series prediction. The algorithm builds on the Online Frank-Wolfe (OFW) and Hedge algorithms. We demonstrate how our algorithm can be applied to the online control of linear time-varying systems in the presence of unpredictable process noise. To this end, we develop a controller with memory and bounded dynamic regret against any optimal time-varying linear feedback control policy. We validate our algorithm in simulated scenarios of online control of linear time-invariant systems.
翻译:投影运算是在线学习中的典型计算瓶颈。本文在带记忆的在线凸优化(OCO-M)框架内实现了无投影在线学习——OCO-M通过允许在线学习损失函数同时依赖于当前和过去决策,从而捕捉决策历史对当前结果的影响。特别地,我们提出了首个带记忆的无投影元基础学习算法,该算法能最小化动态遗憾,即针对任意时变决策序列的最小化次优性。我们的研究动机来自人工智能应用,其中自主智能体需要实时适应时变环境,同时考虑过去决策对当前状态的影响。此类应用示例包括:动态系统的在线控制;统计套利;以及时间序列预测。该算法基于在线Frank-Wolfe(OFW)和Hedge算法构建。我们展示了如何将该算法应用于存在不可预测过程噪声的线性时变系统的在线控制。为此,我们开发了一种带记忆的控制器,其动态遗憾相对于任何最优时变线性反馈控制策略均具有有界性。我们在线性时不变系统的在线控制模拟场景中验证了所提算法。