Electric vehicles (EVs) play critical roles in autonomous mobility-on-demand (AMoD) systems, but their unique charging patterns increase the model uncertainties in AMoD systems (e.g. state transition probability). Since there usually exists a mismatch between the training and test/true environments, incorporating model uncertainty into system design is of critical importance in real-world applications. However, model uncertainties have not been considered explicitly in EV AMoD system rebalancing by existing literature yet, and the coexistence of model uncertainties and constraints that the decision should satisfy makes the problem even more challenging. In this work, we design a robust and constrained multi-agent reinforcement learning (MARL) framework with state transition kernel uncertainty for EV AMoD systems. We then propose a robust and constrained MARL algorithm (ROCOMA) with robust natural policy gradients (RNPG) that trains a robust EV rebalancing policy to balance the supply-demand ratio and the charging utilization rate across the city under model uncertainty. Experiments show that the ROCOMA can learn an effective and robust rebalancing policy. It outperforms non-robust MARL methods in the presence of model uncertainties. It increases the system fairness by 19.6% and decreases the rebalancing costs by 75.8%.
翻译:电动汽车(EV)在自主移动按需服务(AMoD)系统中扮演关键角色,但其独特的充电模式增加了AMoD系统的模型不确定性(例如状态转移概率)。由于训练环境与测试/真实环境通常存在偏差,在实际应用中考虑模型不确定性对系统设计至关重要。然而,现有文献尚未在EV AMoD系统再平衡中显式考虑模型不确定性,且模型不确定性与决策需满足的约束条件共存,使得问题更具挑战性。本文针对EV AMoD系统,设计了一种包含状态转移核不确定性的鲁棒且带约束的多智能体强化学习(MARL)框架。我们随后提出了一种基于鲁棒自然策略梯度(RNPG)的鲁棒且带约束的MARL算法(ROCOMA),该算法在模型不确定性下训练鲁棒的EV再平衡策略,以平衡城市内供需比与充电利用率。实验表明,ROCOMA能够学习到有效且鲁棒的再平衡策略。在存在模型不确定性的情况下,其性能优于非鲁棒的MARL方法,系统公平性提升19.6%,再平衡成本降低75.8%。