Imitation learning (IL) can train computationally-efficient sensorimotor policies from a resource-intensive Model Predictive Controller (MPC), but it often requires many samples, leading to long training times or limited robustness. To address these issues, we combine IL with a variant of robust MPC that accounts for process and sensing uncertainties, and we design a data augmentation (DA) strategy that enables efficient learning of vision-based policies. The proposed DA method, named Tube-NeRF, leverages Neural Radiance Fields (NeRFs) to generate novel synthetic images, and uses properties of the robust MPC (the tube) to select relevant views and to efficiently compute the corresponding actions. We tailor our approach to the task of localization and trajectory tracking on a multirotor, by learning a visuomotor policy that generates control actions using images from the onboard camera as only source of horizontal position. Our evaluations numerically demonstrate learning of a robust visuomotor policy with an 80-fold increase in demonstration efficiency and a 50% reduction in training time over current IL methods. Additionally, our policies successfully transfer to a real multirotor, achieving accurate localization and low tracking errors despite large disturbances, with an onboard inference time of only 1.5 ms.
翻译:模仿学习(IL)可通过资源密集型模型预测控制器(MPC)训练出计算高效的感知运动策略,但往往需要大量样本,导致训练时间过长或鲁棒性受限。为解决这些问题,我们将IL与考虑过程及感知不确定性的鲁棒MPC变体相结合,并设计了一种数据增强(DA)策略,从而能够高效学习基于视觉的策略。该DA方法名为Tube-NeRF,利用神经辐射场(NeRF)生成新型合成图像,并借助鲁棒MPC的"管"特性选择相关视角及高效计算对应动作。我们针对多旋翼飞行器定位与轨迹跟踪任务,学习了一种仅利用机载相机图像作为水平位置唯一来源的视触觉策略来生成控制动作。数值评估表明,所学习的鲁棒视触觉策略相比现有IL方法实现了80倍的演示效率提升和50%的训练时间缩减。此外,该策略成功迁移至真实多旋翼飞行器,在大扰动下仍能实现精准定位与低跟踪误差,机载推理时间仅需1.5毫秒。