This paper develops a Decentralized Multi-Agent Reinforcement Learning (Dec-MARL) method to solve the SoC balancing problem in the distributed energy storage system (DESS). First, the SoC balancing problem is formulated into a finite Markov decision process with action constraints derived from demand balance, which can be solved by Dec-MARL. Specifically, the first-order average consensus algorithm is utilized to expand the observations of the DESS state in a fully-decentralized way, and the initial actions (i.e., output power) are decided by the agents (i.e., energy storage units) according to these observations. In order to get the final actions in the allowable range, a counterfactual demand balance algorithm is proposed to balance the total demand and the initial actions. Next, the agents execute the final actions and get local rewards from the environment, and the DESS steps into the next state. Finally, through the first-order average consensus algorithm, the agents get the average reward and the expended observation of the next state for later training. By the above procedure, Dec-MARL reveals outstanding performance in a fully-decentralized system without any expert experience or constructing any complicated model. Besides, it is flexible and can be extended to other decentralized multi-agent systems straightforwardly. Extensive simulations have validated the effectiveness and efficiency of Dec-MARL.
翻译:本文提出了一种分散式多智能体强化学习(Dec-MARL)方法,用于解决分布式储能系统(DESS)中的荷电状态(SoC)均衡问题。首先,将SoC均衡问题建模为具有需求平衡派生动作约束的有限马尔可夫决策过程,该问题可通过Dec-MARL求解。具体而言,利用一阶平均一致性算法以完全分散的方式扩展DESS状态的观测信息,智能体(即储能单元)根据这些观测信息决定初始动作(即输出功率)。为在允许范围内获得最终动作,提出了一种反事实需求平衡算法以协调总需求与初始动作之间的平衡。随后,智能体执行最终动作并从环境获取局部奖励,DESS进入下一状态。最后,通过一阶平均一致性算法,智能体获取平均奖励及下一状态的扩展观测值,用于后续训练。通过上述流程,Dec-MARL在无需任何专家经验或构建复杂模型的情况下,在完全分散式系统中展现出卓越性能。此外,该方法具有灵活性,可直接推广至其他分散式多智能体系统。大量仿真实验验证了Dec-MARL的有效性与高效性。