Energy storage devices, such as batteries, thermal energy storages, and hydrogen systems, can help mitigate climate change by ensuring a more stable and sustainable power supply. To maximize the effectiveness of such energy storage, determining the appropriate charging and discharging amounts for each time period is crucial. Reinforcement learning is preferred over traditional optimization for the control of energy storage due to its ability to adapt to dynamic and complex environments. However, the continuous nature of charging and discharging levels in energy storage poses limitations for discrete reinforcement learning, and time-varying feasible charge-discharge range based on state of charge (SoC) variability also limits the conventional continuous reinforcement learning. In this paper, we propose a continuous reinforcement learning approach that takes into account the time-varying feasible charge-discharge range. An additional objective function was introduced for learning the feasible action range for each time period, supplementing the objectives of training the actor for policy learning and the critic for value learning. This actively promotes the utilization of energy storage by preventing them from getting stuck in suboptimal states, such as continuous full charging or discharging. This is achieved through the enforcement of the charging and discharging levels into the feasible action range. The experimental results demonstrated that the proposed method further maximized the effectiveness of energy storage by actively enhancing its utilization.
翻译:储能设备(如电池、热能存储装置及氢能系统)能够通过提供更稳定可持续的电力供应来助力缓解气候变化。为最大化此类储能的效能,确定每个时段合适的充放电量至关重要。由于强化学习能够适应动态复杂环境,相较于传统优化方法更受储能控制领域的青睐。然而,储能充放电水平的连续性对离散强化学习构成限制,而基于荷电状态动态变化的时变可行充放电区间同样制约了传统连续强化学习的应用。本文提出一种考虑时变可行充放电区间的连续强化学习方法。通过引入辅助目标函数学习各时段的可行动作区间,补充了策略学习中的演员网络训练与价值学习中的评论家网络训练目标。该方法通过将充放电水平强制约束在可行动作区间内,主动促进储能设备的利用率,避免其陷入持续满充或满放的次优状态。实验结果表明,所提方法通过主动提升储能利用率,进一步最大化了储能的效能。