We present a method for end-to-end learning of Koopman surrogate models for optimal performance in a specific control task. In contrast to previous contributions that employ standard reinforcement learning (RL) algorithms, we use a training algorithm that exploits the potential differentiability of environments based on mechanistic simulation models to aid the policy optimization. We evaluate the performance of our method by comparing it to that of other controller type and training algorithm combinations on an existing economic nonlinear model predictive control (eNMPC) case study of a continuous stirred-tank reactor (CSTR) model. Compared to the benchmark methods, our method produces similar economic performance but causes considerably fewer and less severe constraint violations. Thus, for this case study, our method outperforms the others and offers a promising path toward more performant controllers that employ dynamic surrogate models.
翻译:本文提出一种端到端学习方法,用于学习在特定控制任务中实现最优性能的Koopman代理模型。与以往采用标准强化学习(RL)算法的研究不同,我们利用一种训练算法,该算法基于机理仿真模型的环境潜在可微性来辅助策略优化。我们通过在一个现有的连续搅拌釜反应器(CSTR)模型的经济非线性模型预测控制(eNMPC)案例研究中,将我们的方法与其他控制器类型和训练算法组合的性能进行比较,从而评估我们方法的性能。与基准方法相比,我们的方法产生了相似的经济性能,但引起的约束违反次数显著更少且严重程度更低。因此,对于此案例研究,我们的方法优于其他方法,并为采用动态代理模型实现更高性能的控制器提供了一条有前景的路径。