Harnessing Implicit Cooperation: A Multi-Agent Reinforcement Learning Approach Towards Decentralized Local Energy Markets

This paper proposes implicit cooperation, a framework enabling decentralized agents to approximate optimal coordination in local energy markets without explicit peer-to-peer communication. We formulate the problem as a decentralized partially observable Markov decision problem that is solved through a multi-agent reinforcement learning task in which agents use stigmergic signals (key performance indicators at the system level) to infer and react to global states. Through a 3x3 factorial design on an IEEE 34-node topology, we evaluated three training paradigms (CTCE, CTDE, DTDE) and three algorithms (PPO, APPO, SAC). Results identify APPO-DTDE as the optimal configuration, achieving a coordination score of 91.7% relative to the theoretical centralized benchmark (CTCE). However, a critical trade-off emerges between efficiency and stability: while the centralized benchmark maximizes allocative efficiency with a peer-to-peer trade ratio of 0.6, the fully decentralized approach (DTDE) demonstrates superior physical stability. Specifically, DTDE reduces the variance of grid balance by 31% compared to hybrid architectures, establishing a highly predictable, import-biased load profile that simplifies grid regulation. Furthermore, topological analysis reveals emergent spatial clustering, where decentralized agents self-organize into stable trading communities to minimize congestion penalties. While SAC excelled in hybrid settings, it failed in decentralized environments due to entropy-driven instability. This research proves that stigmergic signaling provides sufficient context for complex grid coordination, offering a robust, privacy-preserving alternative to expensive centralized communication infrastructure.

翻译：本文提出隐性合作框架，使去中心化智能体能够在无需显式点对点通信的情况下近似实现本地能源市场的最优协调。我们将该问题建模为去中心化部分可观测马尔可夫决策过程，通过多智能体强化学习任务求解，其中智能体利用共识信号（系统层面的关键性能指标）来推断全局状态并作出响应。通过在IEEE 34节点拓扑结构上进行3×3因子设计实验，我们评估了三种训练范式（CTCE、CTDE、DTDE）和三种算法（PPO、APPO、SAC）。结果表明APPO-DTDE为最优配置方案，其协调得分达到理论集中式基准（CTCE）的91.7%。然而，效率与稳定性之间存在关键权衡：集中式基准以0.6的点对点交易比率实现分配效率最大化，而完全去中心化方法（DTDE）展现出更优的物理稳定性。具体而言，相较于混合架构，DTDE将电网平衡方差降低31%，建立了高度可预测的偏重输入型负荷曲线，从而简化了电网调控。拓扑分析进一步揭示了涌现的空间聚类现象——去中心化智能体自组织形成稳定交易社区以最小化拥塞惩罚。虽然SAC在混合环境中表现优异，但其熵驱动的不稳定性导致在完全去中心化环境中失效。本研究证明共识信号能为复杂电网协调提供充分语境信息，为昂贵的集中式通信基础设施提供了兼具鲁棒性与隐私保护特性的替代方案。