Online learning and model reference adaptive control have many interesting intersections. One area where they differ however is in how the algorithms are analyzed and what objective or metric is used to discriminate "good" algorithms from "bad" algorithms. In adaptive control there are usually two objectives: 1) prove that all time varying parameters/states of the system are bounded, and 2) that the instantaneous error between the adaptively controlled system and a reference system converges to zero over time (or at least a compact set). For online learning the performance of algorithms is often characterized by the regret the algorithm incurs. Regret is defined as the cumulative loss (cost) over time from the online algorithm minus the cumulative loss (cost) of the single optimal fixed parameter choice in hindsight. Another significant difference between the two areas of research is with regard to the assumptions made in order to obtain said results. Adaptive control makes assumptions about the input-output properties of the control problem and derives solutions for a fixed error model or optimization task. In the online learning literature results are derived for classes of loss functions (i.e. convex) while a priori assuming certain signals are bounded. In this work we discuss these differences in detail through the regret based analysis of gradient descent for convex functions and the control based analysis of a streaming regression problem. We close with a discussion about the newly defined paradigm of online adaptive control.
翻译:在线学习与模型参考自适应控制存在诸多有趣的交叉点。然而,两者的一个差异在于算法分析方式以及区分“优”劣”算法时所采用的目标或度量标准。在自适应控制中,通常有两个目标:1)证明系统所有时变参数/状态均有界;2)自适应控制系统与参考系统之间的瞬时误差随时间收敛至零(或至少收敛至紧集)。对于在线学习,算法的性能常通过算法产生的遗憾来表征。遗憾定义为在线算法随时间累积的损失(成本)减去事后最优固定参数选择所对应的累积损失(成本)。这两个研究领域的另一个显著差异在于为获得所述结果所做的假设。自适应控制对控制问题的输入输出特性作出假设,并针对固定误差模型或优化任务推导解决方案。在线学习文献中的结果则是针对损失函数类(如凸函数)推导得出,同时先验地假设某些信号有界。本文通过凸函数梯度下降的遗憾分析,以及流式回归问题的控制理论分析,详细探讨这些差异。最后,我们讨论了新定义的在线自适应控制范式。