With the proliferation of electric vehicles (EVs), the transportation network and power grid become increasingly interdependent and coupled via charging stations. The concomitant growth in charging demand has posed challenges for both networks, highlighting the importance of charging coordination. Existing literature largely overlooks the interactions between power grid security and traffic efficiency. In view of this, we study the en-route charging station (CS) recommendation problem for EVs in dynamically coupled transportation-power systems. The system-level objective is to maximize the overall traffic efficiency while ensuring the safety of the power grid. This problem is for the first time formulated as a constrained Markov decision process (CMDP), and an online prediction-assisted safe reinforcement learning (OP-SRL) method is proposed to learn the optimal and secure policy by extending the PPO method. To be specific, we mainly address two challenges. First, the constrained optimization problem is converted into an equivalent unconstrained optimization problem by applying the Lagrangian method. Second, to account for the uncertain long-time delay between performing CS recommendation and commencing charging, we put forward an online sequence-to-sequence (Seq2Seq) predictor for state augmentation to guide the agent in making forward-thinking decisions. Finally, we conduct comprehensive experimental studies based on the Nguyen-Dupuis network and a large-scale real-world road network, coupled with IEEE 33-bus and IEEE 69-bus distribution systems, respectively. Results demonstrate that the proposed method outperforms baselines in terms of road network efficiency, power grid safety, and EV user satisfaction. The case study on the real-world network also illustrates the applicability in the practical context.
翻译:随着电动汽车(EV)的普及,交通网络与电网通过充电站日益相互依赖和耦合。随之而来的充电需求增长给两个网络都带来了挑战,凸显了充电协调的重要性。现有文献大多忽视了电网安全与交通效率之间的相互作用。鉴于此,我们研究了动态耦合交通-电力系统中电动汽车的途中充电站(CS)推荐问题。系统级目标是在确保电网安全的同时,最大化整体交通效率。该问题首次被建模为一个约束马尔可夫决策过程(CMDP),并通过扩展PPO方法,提出了一种在线预测辅助安全强化学习(OP-SRL)方法来学习最优且安全的策略。具体而言,我们主要解决了两个挑战。首先,通过应用拉格朗日方法,将约束优化问题转化为等效的无约束优化问题。其次,为了应对执行充电站推荐与开始充电之间不确定的长时间延迟,我们提出了一种在线序列到序列(Seq2Seq)预测器进行状态增强,以指导智能体做出具有前瞻性的决策。最后,我们基于Nguyen-Dupuis网络和一个大规模真实世界路网,分别耦合IEEE 33节点和IEEE 69节点配电系统,进行了全面的实验研究。结果表明,所提出的方法在路网效率、电网安全和电动汽车用户满意度方面均优于基线方法。在真实世界网络上的案例研究也说明了该方法在实际应用中的适用性。