Reinforcement learning (RL) can be used to tune data-driven (economic) nonlinear model predictive controllers ((e)NMPCs) for optimal performance in a specific control task by optimizing the dynamic model or parameters in the policy's objective function or constraints, such as state bounds. However, the sample efficiency of RL is crucial, and to improve it, we combine a model-based RL algorithm with our published method that turns Koopman (e)NMPCs into automatically differentiable policies. We apply our approach to an eNMPC case study of a continuous stirred-tank reactor (CSTR) model from the literature. The approach outperforms benchmark methods, i.e., data-driven eNMPCs using models based on system identification without further RL tuning of the resulting policy, and neural network controllers trained with model-based RL, by achieving superior control performance and higher sample efficiency. Furthermore, utilizing partial prior knowledge about the system dynamics via physics-informed learning further increases sample efficiency.
翻译:强化学习(RL)可用于通过优化动态模型或策略目标函数及约束(如状态边界)中的参数,为特定控制任务调整数据驱动的(经济型)非线性模型预测控制器((e)NMPC)以实现最优性能。然而,RL的样本效率至关重要。为提高样本效率,我们将一种基于模型的RL算法与我们已发表的方法相结合,该方法可将Koopman (e)NMPC转化为自动可微策略。我们将所提方法应用于文献中连续搅拌釜反应器(CSTR)模型的eNMPC案例研究。该方法在控制性能和样本效率方面均优于基准方法——即使用基于系统辨识模型的数据驱动eNMPC(未对所得策略进行进一步RL调优)以及通过基于模型的RL训练的神经网络控制器。此外,通过物理信息学习利用系统动力学的部分先验知识可进一步提升样本效率。