In this paper, we introduce a new class of parameterized controllers, drawing inspiration from Model Predictive Control (MPC). The controller resembles a Quadratic Programming (QP) solver of a linear MPC problem, with the parameters of the controller being trained via Deep Reinforcement Learning (DRL) rather than derived from system models. This approach addresses the limitations of common controllers with Multi-Layer Perceptron (MLP) or other general neural network architecture used in DRL, in terms of verifiability and performance guarantees, and the learned controllers possess verifiable properties like persistent feasibility and asymptotic stability akin to MPC. On the other hand, numerical examples illustrate that the proposed controller empirically matches MPC and MLP controllers in terms of control performance and has superior robustness against modeling uncertainty and noises. Furthermore, the proposed controller is significantly more computationally efficient compared to MPC and requires fewer parameters to learn than MLP controllers. Real-world experiments on vehicle drift maneuvering task demonstrate the potential of these controllers for robotics and other demanding control tasks.
翻译:本文提出了一类新的参数化控制器,灵感来源于模型预测控制(MPC)。该控制器类似于线性MPC问题的二次规划(QP)求解器,其参数通过深度强化学习(DRL)训练得到,而非源于系统模型。这种方法克服了DRL中常用的多层感知机(MLP)或其他通用神经网络架构在可验证性和性能保证方面的局限性,使学习得到的控制器具备类似MPC的持久可行性和渐近稳定性等可验证性质。另一方面,数值示例表明,所提控制器在控制性能上与MPC和MLP控制器经验性地相当,且对模型不确定性和噪声具有更优的鲁棒性。此外,与MPC相比,该控制器的计算效率显著提高,且所需学习的参数少于MLP控制器。在车辆漂移操控任务上的实际实验验证了此类控制器在机器人学及其他高要求控制任务中的潜力。