The application of supervised learning techniques in combination with model predictive control (MPC) has recently generated significant interest, particularly in the area of approximate explicit MPC, where function approximators like deep neural networks are used to learn the MPC policy via optimal state-action pairs generated offline. While the aim of approximate explicit MPC is to closely replicate the MPC policy, substituting online optimization with a trained neural network, the performance guarantees that come with solving the online optimization problem are typically lost. This paper considers an alternative strategy, where supervised learning is used to learn the optimal value function offline instead of learning the optimal policy. This can then be used as the cost-to-go function in a myopic MPC with a very short prediction horizon, such that the online computation burden reduces significantly without affecting the controller performance. This approach differs from existing work on value function approximations in the sense that it learns the cost-to-go function by using offline-collected state-value pairs, rather than closed-loop performance data. The cost of generating the state-value pairs used for training is addressed using a sensitivity-based data augmentation scheme.
翻译:最近,将有监督学习技术与模型预测控制相结合的应用引起了广泛关注,特别是在近似显式MPC领域——通过深度神经网络等函数逼近器,利用离线生成的最优状态-动作对来学习MPC策略。虽然近似显式MPC的目标是精确复现MPC策略,用训练好的神经网络替代在线优化,但伴随在线优化求解的性能保证通常会丧失。本文提出了一种替代策略:利用有监督学习离线学习最优价值函数(而非最优策略)。该函数可作为具有极短预测时域的近视MPC中的代价函数,从而在不影响控制器性能的前提下显著降低在线计算负担。该方法与现有价值函数逼近工作的本质区别在于:通过离线收集的状态-价值对(而非闭环性能数据)来学习代价函数。针对训练所需状态-价值对的生成成本问题,本文采用基于灵敏度的数据增强方案加以解决。