The goal of this paper is to make a strong point for the usage of dynamical models when using reinforcement learning (RL) for feedback control of dynamical systems governed by partial differential equations (PDEs). To breach the gap between the immense promises we see in RL and the applicability in complex engineering systems, the main challenges are the massive requirements in terms of the training data, as well as the lack of performance guarantees. We present a solution for the first issue using a data-driven surrogate model in the form of a convolutional LSTM with actuation. We demonstrate that learning an actuated model in parallel to training the RL agent significantly reduces the total amount of required data sampled from the real system. Furthermore, we show that iteratively updating the model is of major importance to avoid biases in the RL training. Detailed ablation studies reveal the most important ingredients of the modeling process. We use the chaotic Kuramoto-Sivashinsky equation do demonstarte our findings.
翻译:本文旨在强调在使用强化学习(RL)对偏微分方程(PDE)控制的动态系统进行反馈控制时,采用动力学模型的重要性。为了弥合强化学习所展现的巨大潜力与复杂工程系统实际应用之间的差距,主要挑战在于训练数据的巨大需求以及缺乏性能保证。我们针对第一个问题提出了一种解决方案,即采用带有驱动项(actuation)的卷积长短时记忆网络(convolutional LSTM)作为数据驱动替代模型。我们证明,在训练RL智能体的同时并行学习一个带驱动模型,能够显著减少从真实系统中采样所需的总数据量。此外,我们发现迭代更新模型对于避免RL训练中的偏差至关重要。详细的消融研究揭示了建模过程中最重要的关键因素。我们使用混沌的Kuramoto-Sivashinsky方程来展示我们的研究结果。