In this paper, we investigate the scheduling issue of diesel generators (DGs) in an Internet of Things (IoT)-Driven isolated microgrid (MG) by deep reinforcement learning (DRL). The renewable energy is fully exploited under the uncertainty of renewable generation and load demand. The DRL agent learns an optimal policy from history renewable and load data of previous days, where the policy can generate real-time decisions based on observations of past renewable and load data of previous hours collected by connected sensors. The goal is to reduce operating cost on the premise of ensuring supply-demand balance. In specific, a novel finite-horizon partial observable Markov decision process (POMDP) model is conceived considering the spinning reserve. In order to overcome the challenge of discrete-continuous hybrid action space due to the binary DG switching decision and continuous energy dispatch (ED) decision, a DRL algorithm, namely the hybrid action finite-horizon RDPG (HAFH-RDPG), is proposed. HAFH-RDPG seamlessly integrates two classical DRL algorithms, i.e., deep Q-network (DQN) and recurrent deterministic policy gradient (RDPG), based on a finite-horizon dynamic programming (DP) framework. Extensive experiments are performed with real-world data in an IoT-driven MG to evaluate the capability of the proposed algorithm in handling the uncertainty due to inter-hour and inter-day power fluctuation and to compare its performance with those of the benchmark algorithms.
翻译:本文研究了基于深度强化学习(DRL)的物联网(IoT)驱动孤立微电网(MG)中柴油发电机(DG)的调度问题。在可再生能源发电与负荷需求存在不确定性的条件下,可再生能源得到充分利用。DRL智能体根据前几日的可再生能源与负荷历史数据学习最优策略,该策略能基于连接传感器采集的过去几小时可再生能源与负荷观测数据生成实时决策,目标是在确保供需平衡的前提下降低运行成本。具体而言,考虑旋转备用约束,构建了一种新型有限时域部分可观测马尔可夫决策过程(POMDP)模型。为克服DG二进制启停决策与连续能量调度(ED)决策导致的离散-连续混合动作空间挑战,提出了一种混合动作有限时域RDPG(HAFH-RDPG)算法。HAFH-RDPG基于有限时域动态规划(DP)框架,将深度Q网络(DQN)与循环确定性策略梯度(RDPG)两种经典DRL算法无缝融合。利用物联网驱动微电网中的真实数据开展大量实验,评估所提算法应对小时级与日级功率波动的能力,并将其性能与基准算法进行对比。