The coupling of deep reinforcement learning to numerical flow control problems has recently received a considerable attention, leading to groundbreaking results and opening new perspectives for the domain. Due to the usually high computational cost of fluid dynamics solvers, the use of parallel environments during the learning process represents an essential ingredient to attain efficient control in a reasonable time. Yet, most of the deep reinforcement learning literature for flow control relies on on-policy algorithms, for which the massively parallel transition collection may break theoretical assumptions and lead to suboptimal control models. To overcome this issue, we propose a parallelism pattern relying on partial-trajectory buffers terminated by a return bootstrapping step, allowing a flexible use of parallel environments while preserving the on-policiness of the updates. This approach is illustrated on a CPU-intensive continuous flow control problem from the literature.
翻译:将深度强化学习与数值流动控制问题相结合近来受到广泛关注,该方法取得了突破性成果并为该领域开辟了新视角。由于流体动力学求解器通常计算成本高昂,学习过程中使用并行环境成为在合理时间内实现高效控制的关键要素。然而,当前多数流动控制领域的深度强化学习研究采用策略梯度算法,其大规模并行转移采集过程可能破坏理论假设,导致次优控制模型。为克服这一难题,我们提出一种基于部分轨迹缓冲区与回报引导自举步骤相结合的并行模式,该模式在保持更新策略梯度特性的同时,实现了并行环境的灵活运用。本研究将该方法应用于文献中一个计算密集型的连续流动控制问题加以验证。