Safe learning of locomotion skills is still an open problem. Indeed, the intrinsically unstable nature of the open-loop dynamics of locomotion systems renders naive learning from scratch prone to catastrophic failures in the real world. In this work, we investigate the use of iterative algorithms to safely learn locomotion skills from model predictive control (MPC). In our framework, we use MPC as an expert and take inspiration from the safe data aggregation (SafeDAGGER) framework to minimize the number of failures during training of the policy. Through a comparison with other standard approaches such as behavior cloning and vanilla DAGGER, we show that not only our approach has a substantially fewer number of failures during training, but the resulting policy is also more robust to external disturbances.
翻译:运动技能的安全学习仍是一个开放性问题。事实上,由于运动系统开环动力学固有的不稳定性,在现实世界中进行从零开始的简单学习极易导致灾难性故障。本研究探讨了利用迭代算法从模型预测控制(MPC)中安全学习运动技能的方法。在我们的框架中,将MPC作为专家策略,并借鉴安全数据聚合(SafeDAGGER)框架的思想,以最小化策略训练过程中的故障次数。通过与行为克隆及标准DAGGER等其他常规方法的比较,我们证明该方法不仅在训练过程中故障次数显著减少,且所得策略对外部干扰具有更强的鲁棒性。