Deep Reinforcement Learning (DRL) has emerged as a promising approach for handling highly dynamic and nonlinear Active Flow Control (AFC) problems. However, the computational cost associated with training DRL models presents a significant performance bottleneck. To address this challenge and enable efficient scaling on high-performance computing architectures, this study focuses on optimizing DRL-based algorithms in parallel settings. We validate an existing state-of-the-art DRL framework used for AFC problems and discuss its efficiency bottlenecks. Subsequently, by deconstructing the overall framework and conducting extensive scalability benchmarks for individual components, we investigate various hybrid parallelization configurations and propose efficient parallelization strategies. Moreover, we refine input/output (I/O) operations in multi-environment DRL training to tackle critical overhead associated with data movement. Finally, we demonstrate the optimized framework for a typical AFC problem where near-linear scaling can be obtained for the overall framework. We achieve a significant boost in parallel efficiency from around 49% to approximately 78%, and the training process is accelerated by approximately 47 times using 60 CPU cores. These findings are expected to provide valuable insights for further advancements in DRL-based AFC studies.
翻译:深度强化学习已成为处理高动态和非线性主动流动控制问题的有效方法。然而,训练深度强化学习模型所需的计算成本构成了显著的性能瓶颈。为应对这一挑战并在高性能计算架构上实现高效扩展,本研究聚焦于并行环境下深度强化学习算法的优化。我们验证了用于主动流动控制问题的现有最优深度强化学习框架,并讨论了其效率瓶颈。随后,通过分解整体框架并对各组件进行大规模可扩展性基准测试,我们研究了多种混合并行配置,并提出了高效的并行化策略。此外,我们优化了多环境深度强化学习训练中的输入/输出操作,以解决与数据迁移相关的关键开销问题。最终,我们在典型主动流动控制问题上展示了优化后的框架,实现了整体框架的近线性扩展。并行效率从约49%显著提升至约78%,使用60个CPU核心时训练过程加速约47倍。这些成果预计将为基于深度强化学习的主动流动控制研究的进一步发展提供宝贵见解。