The deployment of agile autonomous systems in challenging, unstructured environments requires adaptation capabilities and robustness to uncertainties. Existing robust and adaptive controllers, such as those based on model predictive control (MPC), can achieve impressive performance at the cost of heavy online onboard computations. Strategies that efficiently learn robust and onboard-deployable policies from MPC have emerged, but they still lack fundamental adaptation capabilities. In this work, we extend an existing efficient Imitation Learning (IL) algorithm for robust policy learning from MPC with the ability to learn policies that adapt to challenging model/environment uncertainties. The key idea of our approach consists in modifying the IL procedure by conditioning the policy on a learned lower-dimensional model/environment representation that can be efficiently estimated online. We tailor our approach to the task of learning an adaptive position and attitude control policy to track trajectories under challenging disturbances on a multirotor. Evaluations in simulation show that a high-quality adaptive policy can be obtained in about $1.3$ hours. We additionally empirically demonstrate rapid adaptation to in- and out-of-training-distribution uncertainties, achieving a $6.1$ cm average position error under wind disturbances that correspond to about $50\%$ of the weight of the robot, and that are $36\%$ larger than the maximum wind seen during training.
翻译:在充满挑战的非结构化环境中部署敏捷自主系统需要具备适应能力和对不确定性的鲁棒性。现有的鲁棒自适应控制器,例如基于模型预测控制(MPC)的控制器,能够以高昂的在线机载计算成本实现令人印象深刻的性能。从MPC中高效学习鲁棒且可机载部署的策略的方法已经出现,但这些方法仍缺乏基本的适应能力。在本工作中,我们扩展了一种现有的从MPC学习鲁棒策略的高效模仿学习(IL)算法,使其能够学习适应具有挑战性的模型/环境不确定性的策略。我们方法的核心思想在于修改IL过程,通过将策略条件化为一个学习的低维模型/环境表示来改进策略学习,该表示能够在线上高效估计。我们将方法应用于在多旋翼飞行器上学习自适应位置和姿态控制策略以在具有挑战性的扰动下跟踪轨迹的任务。仿真评估表明,高质量的自适应策略可在约1.3小时内获得。我们进一步通过实验证明了策略对训练分布内和分布外不确定性的快速适应能力,在相当于飞行器重量约50%的风干扰下实现了6.1厘米的平均位置误差,该风干扰比训练期间所见最大风干扰大36%。