This text presents an introduction to an emerging paradigm in control of dynamical systems and differentiable reinforcement learning called online nonstochastic control. The new approach applies techniques from online convex optimization and convex relaxations to obtain new methods with provable guarantees for classical settings in optimal and robust control. The primary distinction between online nonstochastic control and other frameworks is the objective. In optimal control, robust control, and other control methodologies that assume stochastic noise, the goal is to perform comparably to an offline optimal strategy. In online nonstochastic control, both the cost functions as well as the perturbations from the assumed dynamical model are chosen by an adversary. Thus the optimal policy is not defined a priori. Rather, the target is to attain low regret against the best policy in hindsight from a benchmark class of policies. This objective suggests the use of the decision making framework of online convex optimization as an algorithmic methodology. The resulting methods are based on iterative mathematical optimization algorithms, and are accompanied by finite-time regret and computational complexity guarantees.
翻译:本文介绍了一种新兴的动态系统控制与可微强化学习范式——在线非随机控制。该新方法运用在线凸优化与凸松弛技术,为最优控制与鲁棒控制中的经典场景提供了具有可证明保证的新方案。在线非随机控制与其他框架的核心区别在于其目标函数:在最优控制、鲁棒控制及其他假设随机噪声的控制方法中,目标是实现与离线最优策略相当的性能;而在在线非随机控制中,成本函数与假设动态模型中的扰动均由对抗方选择,因此最优策略并非先验定义。其目标是在基准策略类中,针对事后最优策略实现低遗憾值。这一目标促使采用在线凸优化的决策框架作为算法方法论。由此产生的算法基于迭代数学优化方法,并附带有限时间遗憾值与计算复杂度的理论保证。