Energy markets can provide incentives for undesired behavior of market participants. Multi-agent Reinforcement learning (MARL) is a promising new approach to determine the expected behavior of energy market participants. However, reinforcement learning requires many interactions with the system to converge, and the power system environment often consists of extensive computations, e.g., optimal power flow (OPF) calculation for market clearing. To tackle this complexity, we provide a model of the energy market to a basic MARL algorithm, in form of a learned OPF approximation and explicit market rules. The learned OPF surrogate model makes an explicit solving of the OPF completely unnecessary. Our experiments demonstrate that the model additionally reduces training time by about one order of magnitude, but at the cost of a slightly worse approximation of the Nash equilibrium. Potential applications of our method are market design, more realistic modeling of market participants, and analysis of manipulative behavior.
翻译:能源市场可能激励市场参与者的不良行为。多智能体强化学习(MARL)是确定能源市场参与者预期行为的一种有前景的新方法。然而,强化学习需要与系统进行大量交互才能收敛,而电力系统环境通常包含大量计算,例如市场出清的最优潮流(OPF)计算。为解决这一复杂性,我们向基础MARL算法提供能源市场模型,该模型以学习得到的OPF近似函数和显式市场规则的形式呈现。学习得到的OPF代理模型使得显式求解OPF完全不再必要。实验表明,该模型可将训练时间减少约一个数量级,但代价是对纳什均衡的近似精度略有下降。我们方法的潜在应用包括市场设计、更真实的市场参与者建模以及操纵行为分析。