This paper presents a novel safe reinforcement learning algorithm for strategic bidding of Virtual Power Plants (VPPs) in day-ahead electricity markets. The proposed algorithm utilizes the Deep Deterministic Policy Gradient (DDPG) method to learn competitive bidding policies without requiring an accurate market model. Furthermore, to account for the complex internal physical constraints of VPPs we introduce two enhancements to the DDPG method. Firstly, a projection-based safety shield that restricts the agent's actions to the feasible space defined by the non-linear power flow equations and operating constraints of distributed energy resources is derived. Secondly, a penalty for the shield activation in the reward function that incentivizes the agent to learn a safer policy is introduced. A case study based on the IEEE 13-bus network demonstrates the effectiveness of the proposed approach in enabling the agent to learn a highly competitive, safe strategic policy.
翻译:本文提出了一种新颖的安全强化学习算法,用于虚拟电厂在日前电力市场中的战略投标。该算法利用深度确定性策略梯度方法学习竞争性投标策略,无需精确的市场模型。此外,为应对虚拟电厂复杂的内部物理约束,我们引入了两项对DDPG方法的改进:第一,推导了一种基于投影的安全防护机制,将智能体的动作限制在由非线性潮流方程及分布式能源运行约束所定义的可行空间内;第二,在奖励函数中引入对防护机制激活的惩罚,激励智能体学习更安全的策略。基于IEEE 13节点网络的案例研究证明了所提方法在使智能体学习高度竞争性且安全的战略策略方面的有效性。