In many practical control applications, the performance level of a closed-loop system degrades over time due to the change of plant characteristics. Thus, there is a strong need for redesigning a controller without going through the system modeling process, which is often difficult for closed-loop systems. Reinforcement learning (RL) is one of the promising approaches that enable model-free redesign of optimal controllers for nonlinear dynamical systems based only on the measurement of the closed-loop system. However, the learning process of RL usually requires a considerable number of trial-and-error experiments using the poorly controlled system that may accumulate wear on the plant. To overcome this limitation, we propose a model-free two-step design approach that improves the transient learning performance of RL in an optimal regulator redesign problem for unknown nonlinear systems. Specifically, we first design a linear control law that attains some degree of control performance in a model-free manner, and then, train the nonlinear optimal control law with online RL by using the designed linear control law in parallel. We introduce an offline RL algorithm for the design of the linear control law and theoretically guarantee its convergence to the LQR controller under mild assumptions. Numerical simulations show that the proposed approach improves the transient learning performance and efficiency in hyperparameter tuning of RL.
翻译:在许多实际控制应用中,由于被控对象特性的变化,闭环系统的性能水平会随时间退化。因此,迫切需要在不经过系统建模过程(这对闭环系统而言往往较为困难)的情况下重新设计控制器。强化学习(RL)是一种有前景的方法,能够仅基于闭环系统的测量数据,实现对非线性动态系统最优控制器的无模型再设计。然而,RL的学习过程通常需要大量使用性能较差的被控系统进行试错实验,这可能导致被控对象的磨损累积。为克服这一局限,我们提出一种无模型两步设计方法,用于改善未知非线性系统最优调节器再设计问题中RL的暂态学习性能。具体而言,我们首先以无模型方式设计一个能达到一定控制性能的线性控制律,然后通过在线RL并行利用该线性控制律训练非线性最优控制律。我们引入一种离线RL算法用于线性控制律的设计,并在温和假设下从理论上保证其收敛到LQR控制器。数值仿真结果表明,所提方法改善了RL的暂态学习性能及超参数调优效率。