Imitation learning enables the synthesis of controllers for complex objectives and highly uncertain plant models. However, methods to provide stability guarantees to imitation learned controllers often rely on large amounts of data and/or known plant models. In this paper, we explore an input-output (IO) stability approach to dissipative imitation learning, which achieves stability with sparse data sets and with little known about the plant model. A closed-loop stable dynamic output feedback controller is learned using expert data, a coarse IO plant model, and a new constraint to enforce dissipativity on the learned controller. While the learning objective is nonconvex, iterative convex overbounding (ICO) and projected gradient descent (PGD) are explored as methods to successfully learn the controller. This new imitation learning method is applied to two unknown plants and compared to traditionally learned dynamic output feedback controller and neural network controller. With little knowledge of the plant model and a small data set, the dissipativity constrained learned controller achieves closed loop stability and successfully mimics the behavior of the expert controller, while other methods often fail to maintain stability and achieve good performance.
翻译:模仿学习能够针对复杂目标和高度不确定的被控对象模型综合控制器。然而,为模仿学习控制器提供稳定性保证的方法通常依赖于大量数据和/或已知的被控对象模型。本文探索了一种面向耗散模仿学习的输入-输出(IO)稳定性方法,该方法能在稀疏数据集且对被控对象模型知之甚少的情况下实现稳定性。利用专家数据、一个粗粒度IO被控对象模型以及一个强制学习控制器具有耗散性的新约束,学习得到一个闭环稳定的动态输出反馈控制器。尽管学习目标是非凸的,本文探索了迭代凸包围(ICO)和投影梯度下降(PGD)作为成功学习该控制器的方法。这种新的模仿学习方法应用于两个未知被控对象,并与传统学习得到的动态输出反馈控制器及神经网络控制器进行比较。在缺乏被控对象模型信息且数据集较小的情况下,受耗散性约束的学习控制器实现了闭环稳定性,并成功模仿了专家控制器的行为,而其他方法往往无法维持稳定性或取得良好性能。