Projection operations are a typical computation bottleneck in online learning. In this paper, we enable projection-free online learning within the framework of Online Convex Optimization with Memory (OCO-M) -- OCO-M captures how the history of decisions affects the current outcome by allowing the online learning loss functions to depend on both current and past decisions. Particularly, we introduce the first projection-free meta-base learning algorithm with memory that minimizes dynamic regret, i.e., that minimizes the suboptimality against any sequence of time-varying decisions. We are motivated by artificial intelligence applications where autonomous agents need to adapt to time-varying environments in real-time, accounting for how past decisions affect the present. Examples of such applications are: online control of dynamical systems; statistical arbitrage; and time series prediction. The algorithm builds on the Online Frank-Wolfe (OFW) and Hedge algorithms. We demonstrate how our algorithm can be applied to the online control of linear time-varying systems in the presence of unpredictable process noise. To this end, we develop the first controller with memory and bounded dynamic regret against any optimal time-varying linear feedback control policy. We validate our algorithm in simulated scenarios of online control of linear time-invariant systems.
翻译:投影操作是在线学习中典型的计算瓶颈。本文在带记忆在线凸优化(OCO-M)框架下实现了无投影在线学习——OCO-M通过允许在线学习损失函数同时依赖于当前和过去决策,捕捉决策历史对当前结果的影响。特别地,我们首次提出了带记忆的无投影元基学习算法,该算法能最小化动态遗憾,即最小化相对于任意时变决策序列的次优性。我们的研究动机源于人工智能应用,其中自主智能体需要实时适应时变环境,并考虑过去决策对当前的影响。此类应用实例包括:动态系统的在线控制;统计套利;以及时间序列预测。该算法基于在线Frank-Wolfe(OFW)和Hedge算法构建。我们展示了该算法如何在存在不可预测过程噪声的情况下应用于线性时变系统的在线控制。为此,我们开发了首个具有记忆且对任意最优时变线性反馈控制策略具有有界动态遗憾的控制器。我们在线性时不变系统的在线控制仿真场景中验证了所提算法。