The goal of this paper is to make a strong point for the usage of dynamical models when using reinforcement learning (RL) for feedback control of dynamical systems governed by partial differential equations (PDEs). To breach the gap between the immense promises we see in RL and the applicability in complex engineering systems, the main challenges are the massive requirements in terms of the training data, as well as the lack of performance guarantees. We present a solution for the first issue using a data-driven surrogate model in the form of a convolutional LSTM with actuation. We demonstrate that learning an actuated model in parallel to training the RL agent significantly reduces the total amount of required data sampled from the real system. Furthermore, we show that iteratively updating the model is of major importance to avoid biases in the RL training. Detailed ablation studies reveal the most important ingredients of the modeling process. We use the chaotic Kuramoto-Sivashinsky equation do demonstarte our findings.
翻译:本文旨在强调在利用强化学习对由偏微分方程驱动的动力学系统进行反馈控制时,采用动力学模型的重要性。为弥合强化学习展现的巨大潜力与其在复杂工程系统中应用之间的差距,主要挑战在于训练数据需求庞大以及缺乏性能保障。针对第一个问题,我们提出了一种解决方案:采用带激励的卷积长短期记忆网络形式的数据驱动代理模型。我们证明,在训练强化学习智能体的同时并行学习一个受控模型,能够显著减少从真实系统中采集所需数据的总量。此外,我们指出,迭代更新模型对避免强化学习训练中的偏差至关重要。通过详细的消融研究,我们揭示了建模过程中最关键的因素。我们采用混沌库拉莫托-西瓦辛斯基方程来验证我们的发现。