To solve the optimal power flow (OPF) problem, reinforcement learning (RL) emerges as a promising new approach. However, the RL-OPF literature is strongly divided regarding the exact formulation of the OPF problem as an RL environment. In this work, we collect and implement diverse environment design decisions from the literature regarding training data, observation space, episode definition, and reward function choice. In an experimental analysis, we show the significant impact of these environment design options on RL-OPF training performance. Further, we derive some first recommendations regarding the choice of these design decisions. The created environment framework is fully open-source and can serve as a benchmark for future research in the RL-OPF field.
翻译:为解决最优潮流(OPF)问题,强化学习(RL)正成为一种有前景的新方法。然而,关于如何将OPF问题精确表述为RL环境,现有文献存在显著分歧。本文从文献中收集并实现了多种关于训练数据、观测空间、回合定义及奖励函数选择的环境设计方案。通过实验分析,我们展示了这些环境设计选择对RL-OPF训练性能的显著影响。此外,我们初步推导出关于这些设计选择的一些建议。所创建的环境框架完全开源,可作为未来RL-OPF领域研究的基准。