Energy markets can provide incentives for undesired behavior of market participants. Multi-agent Reinforcement learning (MARL) is a promising new approach to predicting the expected behavior of energy market participants. However, reinforcement learning requires many interactions with the system to converge, and the power system environment often consists of extensive computations, e.g., optimal power flow (OPF) calculation for market clearing. To tackle this complexity, we provide a model of the energy market to a basic MARL algorithm in the form of a learned OPF approximation and explicit market rules. The learned OPF surrogate model makes an explicit solving of the OPF completely unnecessary. Our experiments demonstrate that the model additionally reduces training time by about one order of magnitude but at the cost of a slightly worse approximation of the Nash equilibrium. Potential applications of our method are market design, more realistic modeling of market participants, and analysis of manipulative behavior.
翻译:能源市场可能会激励市场参与者的不良行为。多智能体强化学习(MARL)是预测能源市场参与者预期行为的一种有前景的新方法。然而,强化学习需要与系统进行大量交互才能收敛,而电力系统环境通常包含大量计算,例如用于市场出清的最优潮流(OPF)计算。为了应对这一复杂性,我们以学习到的OPF近似和显式市场规则的形式,将能源市场模型纳入基本的MARL算法中。学习到的OPF替代模型使得显式求解OPF变得完全不再必要。我们的实验表明,该模型还将训练时间减少约一个数量级,但代价是对纳什均衡的近似略微变差。我们方法的潜在应用包括市场设计、对市场参与者进行更逼真的建模以及操纵行为分析。