Model predictive control (MPC) has been applied to many platforms in robotics and autonomous systems for its capability to predict a system's future behavior while incorporating constraints that a system may have. To enhance the performance of a system with an MPC controller, one can manually tune the MPC's cost function. However, it can be challenging due to the possibly high dimension of the parameter space as well as the potential difference between the open-loop cost function in MPC and the overall closed-loop performance metric function. This paper presents DiffTune-MPC, a novel learning method, to learn the cost function of an MPC in a closed-loop manner. The proposed framework is compatible with the scenario where the time interval for performance evaluation and MPC's planning horizon have different lengths. We show the auxiliary problem whose solution admits the analytical gradients of MPC and discuss its variations in different MPC settings. Simulation results demonstrate the capability of DiffTune-MPC and illustrate the influence of constraints (from actuation limits) on learning.
翻译:模型预测控制(MPC)因其能够预测系统未来行为并整合系统可能存在的约束,已被广泛应用于机器人与自主系统的众多平台。为提升采用MPC控制器的系统性能,可手动调节MPC的代价函数。然而,由于参数空间可能具有高维性,且MPC中的开环代价函数与整体闭环性能指标函数之间存在潜在差异,这一过程颇具挑战性。本文提出了一种新颖的学习方法——DiffTune-MPC,旨在以闭环方式学习MPC的代价函数。该框架兼容性能评估时间间隔与MPC规划时域长度不同的场景。我们展示了其解能够解析出MPC梯度的辅助问题,并讨论了该问题在不同MPC设定下的变体形式。仿真结果验证了DiffTune-MPC的能力,并阐明了(来自执行机构限制的)约束对学习过程的影响。