Imitation Learning (IL) can generate computationally efficient policies from demonstrations provided by Model Predictive Control (MPC). However, IL methods often require extensive data-collection and training efforts, limiting changes to the policy if the task changes, and they produce policies with limited robustness to new disturbances. In this work, we propose an IL strategy to efficiently compress a computationally expensive MPC into a deep neural network policy that is robust to previously unseen disturbances. By using a robust variant of the MPC, called Robust Tube MPC, and leveraging properties from the controller, we introduce computationally efficient data augmentation methods that enable a significant reduction of the number of MPC demonstrations and training efforts required to generate a robust policy. Our approach opens the possibility of zero-shot transfer of a policy trained from a single MPC demonstration collected in a nominal domain, such as a simulation or a robot in a lab/controlled environment, to a new domain with previously unseen bounded model errors/perturbations. Numerical evaluations performed using linear and nonlinear MPC for agile flight on a multirotor show that our method outperforms strategies commonly employed in IL (such as Dataset-Aggregation (DAgger) and Domain Randomization (DR)) in terms of demonstration-efficiency, training time, and robustness to perturbations unseen during training. Experimental evaluations validate the efficiency and real-world robustness.
翻译:模仿学习(IL)能够利用模型预测控制(MPC)提供的示教数据生成计算高效的策略。然而,IL方法通常需要大量的数据采集与训练投入,在任务变更时难以调整策略,且生成的策略对新扰动的鲁棒性有限。本研究提出一种IL策略,能够将计算代价高昂的MPC高效压缩为深度神经网络策略,并对未预见扰动保持鲁棒性。通过采用鲁棒管状MPC这一MPC的鲁棒变体,并利用控制器特性,我们引入了计算高效的数据增强方法,可显著减少生成鲁棒策略所需的MPC示教数据量与训练成本。该方法实现了在标称域(如仿真环境或实验室受控环境中的机器人)中仅通过单次MPC示教训练的策略,向存在未预见有界模型误差/扰动的新领域进行零样本迁移的可能性。基于线性与非线性MPC在多旋翼飞行器敏捷飞行任务中的数值评估表明,本方法在示教效率、训练时间及对训练未见过扰动的鲁棒性方面,均优于IL常用策略(如数据集聚合(DAgger)与域随机化(DR))。实验评估进一步验证了该方法的效率与真实环境鲁棒性。