Recently, Model Predictive Contouring Control (MPCC) has arisen as the state-of-the-art approach for model-based agile flight. MPCC benefits from great flexibility in trading-off between progress maximization and path following at runtime without relying on globally optimized trajectories. However, finding the optimal set of tuning parameters for MPCC is challenging because (i) the full quadrotor dynamics are non-linear, (ii) the cost function is highly non-convex, and (iii) of the high dimensionality of the hyperparameter space. This paper leverages a probabilistic Policy Search method - Weighted Maximum Likelihood (WML)- to automatically learn the optimal objective for MPCC. WML is sample-efficient due to its closed-form solution for updating the learning parameters. Additionally, the data efficiency provided by the use of a model-based approach allows us to directly train in a high-fidelity simulator, which in turn makes our approach able to transfer zero-shot to the real world. We validate our approach in the real world, where we show that our method outperforms both the previous manually tuned controller and the state-of-the-art auto-tuning baseline reaching speeds of 75 km/h.
翻译:最近,模型预测轮廓控制(MPCC)已成为基于模型的敏捷飞行的最先进方法。MPCC在运行时无需依赖全局优化轨迹,即可在进度最大化与路径跟踪之间实现高度灵活的权衡。然而,为MPCC寻找最优整定参数集极具挑战性,原因在于:(i)四旋翼飞行器完整动力学具有非线性特性,(ii)代价函数高度非凸,以及(iii)超参数空间维度极高。本文利用一种概率性策略搜索方法——加权极大似然估计(WML)——来自动学习MPCC的最优目标函数。WML通过其参数更新的闭式解实现了样本高效性。此外,基于模型的方法带来的数据效率使我们能够直接在高保真度模拟器中进行训练,进而实现零样本迁移至真实世界。我们在真实场景中验证了该方法,结果表明,我们的方法在达到75公里/小时的速度时,不仅优于先前人工整定的控制器,还超越了最先进的自动整定基线方法。