Deep Reinforcement Learning (DRL) has emerged as a promising approach for handling highly dynamic and nonlinear Active Flow Control (AFC) problems. However, the computational cost associated with training DRL models presents a significant performance bottleneck. To address this challenge and enable efficient scaling on high-performance computing architectures, this study focuses on optimizing DRL-based algorithms in parallel settings. We validate an existing state-of-the-art DRL framework used for AFC problems and discuss its efficiency bottlenecks. Subsequently, by deconstructing the overall framework and conducting extensive scalability benchmarks for individual components, we investigate various hybrid parallelization configurations and propose efficient parallelization strategies. Moreover, we refine input/output (I/O) operations in multi-environment DRL training to tackle critical overhead associated with data movement. Finally, we demonstrate the optimized framework for a typical AFC problem where near-linear scaling can be obtained for the overall framework. We achieve a significant boost in parallel efficiency from around 49% to approximately 78%, and the training process is accelerated by approximately 47 times using 60 CPU cores. These findings are expected to provide valuable insights for further advancements in DRL-based AFC studies.
翻译:深度强化学习(DRL)已成为处理高度动态与非线性的主动流动控制(AFC)问题的有前途方法。然而,训练DRL模型相关的计算成本构成了显著的性能瓶颈。为应对这一挑战并在高性能计算架构上实现高效扩展,本研究聚焦于优化并行环境下的DRL算法。我们验证了用于AFC问题的现有最先进DRL框架,并讨论了其效率瓶颈。随后,通过解构整体框架并对各组件进行广泛的可扩展性基准测试,我们研究了多种混合并行配置,并提出了高效的并行策略。此外,我们优化了多环境DRL训练中的输入/输出(I/O)操作,以应对与数据移动相关的主要开销。最后,我们针对典型AFC问题展示了优化后的框架,该框架在整体上可获得近乎线性的扩展性。我们将并行效率从约49%显著提升至约78%,并在使用60个CPU核心时将训练过程加速约47倍。这些发现有望为基于DRL的AFC研究的进一步发展提供宝贵见解。