This paper considers the problem of real-time control and learning in dynamic systems subjected to parametric uncertainties. A combination of Adaptive Control (AC) in the inner loop and a Reinforcement Learning (RL) based policy in the outer loop is proposed such that in real-time the inner-loop AC contracts the closed-loop dynamics towards a reference system, and as the contraction takes hold, the RL in the outerloop directs the overall system towards optimal performance. Two classes of nonlinear dynamic systems are considered, both of which are control-affine. The first class of dynamic systems utilizes equilibrium points with expansion forms around these points and employs a Lyapunov approach while second class of nonlinear systems uses contraction theory. AC-RL controllers are proposed for both classes of systems and shown to lead to online policies that guarantee stability using a high-order tuner and accommodate parametric uncertainties and magnitude limits on the input. In addition to establishing a stability guarantee with real-time control, the AC-RL controller is also shown to lead to parameter learning with persistent excitation for the first class of systems. Numerical validations of all algorithms are carried out using a quadrotor landing task on a moving platform. These results point out the clear advantage of the proposed integrative AC-RL approach.
翻译:本文研究了存在参数不确定性的动态系统中的实时控制与学习问题。提出了一种结合内环自适应控制与外环基于强化学习策略的方法:内环自适应控制使闭环动态向参考系统收缩,当收缩过程稳定后,外环强化学习引导整个系统达到最优性能。本文考虑了两类控制仿射的非线性动态系统。第一类系统利用平衡点及其展开形式,采用李雅普诺夫方法进行分析;第二类非线性系统则应用收缩理论。针对这两类系统分别提出了自适应控制-强化学习控制器,证明其能通过高阶调节器生成在线策略,在保证稳定性的同时适应参数不确定性和输入幅值约束。除了建立实时控制下的稳定性保证外,对于第一类系统,该控制器还能通过持续激励实现参数学习。所有算法均通过四旋翼飞行器在移动平台上的降落任务进行数值验证,结果表明所提出的自适应控制-强化学习综合方法具有显著优势。