Dispatching strategies for gas turbines (GTs) are changing in modern electricity grids. A growing incorporation of intermittent renewable energy requires GTs to operate more but shorter cycles and more frequently on partial loads. Deep reinforcement learning (DRL) has recently emerged as a tool that can cope with this development and dispatch GTs economically. The key advantages of DRL are a model-free optimization and the ability to handle uncertainties, such as those introduced by varying loads or renewable energy production. In this study, three popular DRL algorithms are implemented for an economic GT dispatch problem on a case study in Alberta, Canada. We highlight the benefits of DRL by incorporating an existing thermodynamic software provided by Siemens Energy into the environment model and by simulating uncertainty via varying electricity prices, loads, and ambient conditions. Among the tested algorithms and baseline methods, Deep Q-Networks (DQN) obtained the highest rewards while Proximal Policy Optimization (PPO) was the most sample efficient. We further propose and implement a method to assign GT operation and maintenance cost dynamically based on operating hours and cycles. Compared to existing methods, our approach better approximates the true cost of modern GT dispatch and hence leads to more realistic policies.
翻译:现代电网中燃气轮机的调度策略正在发生变革。随着间歇性可再生能源的日益普及,燃气轮机需要更频繁地运行但周期更短,且更多时间处于部分负荷状态。深度强化学习作为一种新兴技术,能够应对这一发展趋势并实现燃气轮机的经济调度。深度强化学习的核心优势在于其无模型优化能力和处理不确定性(如负荷波动或可再生能源出力变化)的能力。本研究针对加拿大阿尔伯塔省的案例,将三种主流深度强化学习算法应用于燃气轮机经济调度问题。通过将西门子能源提供的热力学软件集成到环境模型中,并利用变化的电价、负荷及环境条件模拟不确定性,我们验证了深度强化学习的优势。在测试的算法与基准方法中,深度Q网络获得了最高奖励,而近端策略优化的样本效率最优。我们进一步提出并实现了一种基于运行时长及启停次数动态分配燃气轮机运维成本的方法。与现有方法相比,该方案能更精确地逼近现代燃气轮机调度的真实成本,从而生成更符合实际的调度策略。