(Economic) nonlinear model predictive control ((e)NMPC) requires dynamic system models that are sufficiently accurate in all relevant state-space regions. These models must also be computationally cheap enough to ensure real-time tractability. Data-driven surrogate models for mechanistic models can be used to reduce the computational burden of (e)NMPC; however, such models are typically trained by system identification for maximum average prediction accuracy on simulation samples and perform suboptimally as part of actual (e)NMPC. We present a method for end-to-end reinforcement learning of dynamic surrogate models for optimal performance in (e)NMPC applications, resulting in predictive controllers that strike a favorable balance between control performance and computational demand. We validate our method on two applications derived from an established nonlinear continuous stirred-tank reactor model. We compare the controller performance to that of MPCs utilizing models trained by the prevailing maximum prediction accuracy paradigm, and model-free neural network controllers trained using reinforcement learning. We show that our method matches the performance of the model-free neural network controllers while consistently outperforming models derived from system identification. Additionally, we show that the MPC policies can react to changes in the control setting without retraining.
翻译:(经济)非线性模型预测控制((e)NMPC)需要在所有相关状态空间区域中足够精确的动态系统模型。这些模型还必须计算成本足够低,以确保实时可解性。机械化模型的数据驱动替代模型可用于减轻(e)NMPC的计算负担;然而,此类模型通常通过系统辨识进行训练,以在模拟样本上获得最大平均预测精度,但在实际(e)NMPC中表现次优。我们提出了一种方法,用于在(e)NMPC应用中实现动态替代模型的端到端强化学习,以实现最优性能,从而产生在控制性能与计算需求之间取得有利平衡的预测控制器。我们在基于一个经典的非线性连续搅拌釜反应器模型的两个应用上验证了该方法。我们将控制器性能与使用当前最大预测精度范式训练的模型的MPC以及使用强化学习训练的无模型神经网络控制器进行了比较。结果表明,我们的方法在匹配无模型神经网络控制器性能的同时,始终优于系统辨识得到的模型。此外,我们证明MPC策略能够在不重新训练的情况下响应控制设置的变化。