Electric autonomous vehicles (EAVs) are getting attention in future autonomous mobility-on-demand (AMoD) systems due to their economic and societal benefits. However, EAVs' unique charging patterns (long charging time, high charging frequency, unpredictable charging behaviors, etc.) make it challenging to accurately predict the EAVs supply in E-AMoD systems. Furthermore, the mobility demand's prediction uncertainty makes it an urgent and challenging task to design an integrated vehicle balancing solution under supply and demand uncertainties. Despite the success of reinforcement learning-based E-AMoD balancing algorithms, state uncertainties under the EV supply or mobility demand remain unexplored. In this work, we design a multi-agent reinforcement learning (MARL)-based framework for EAVs balancing in E-AMoD systems, with adversarial agents to model both the EAVs supply and mobility demand uncertainties that may undermine the vehicle balancing solutions. We then propose a robust E-AMoD Balancing MARL (REBAMA) algorithm to train a robust EAVs balancing policy to balance both the supply-demand ratio and charging utilization rate across the whole city. Experiments show that our proposed robust method performs better compared with a non-robust MARL method that does not consider state uncertainties; it improves the reward, charging utilization fairness, and supply-demand fairness by 19.28%, 28.18%, and 3.97%, respectively. Compared with a robust optimization-based method, the proposed MARL algorithm can improve the reward, charging utilization fairness, and supply-demand fairness by 8.21%, 8.29%, and 9.42%, respectively.
翻译:电动自动驾驶车辆(EAVs)因其经济和社会效益在未来自动驾驶按需出行(AMoD)系统中受到关注。然而,EAVs独特的充电模式(充电时间长、充电频率高、充电行为不可预测等)使得在E-AMoD系统中准确预测EAVs供给面临挑战。此外,出行需求预测的不确定性使得设计一种在供需不确定条件下集成的车辆平衡解决方案成为紧迫且具有挑战性的任务。尽管基于强化学习的E-AMoD平衡算法取得了成功,但在电动车供给或出行需求下的状态不确定性仍未得到探索。本文设计了一种基于多智能体强化学习(MARL)的E-AMoD系统EAVs平衡框架,通过对抗性智能体对可能破坏车辆平衡解决方案的EAVs供给和出行需求不确定性进行建模。随后提出了一种鲁棒E-AMoD平衡MARL(REBAMA)算法,以训练鲁棒的EAVs平衡策略,从而在全城范围内平衡供需比与充电利用率。实验表明,与不考虑状态不确定性的非鲁棒MARL方法相比,本文提出的鲁棒方法性能更优:奖励、充电利用率公平性和供需公平性分别提升了19.28%、28.18%和3.97%。与基于鲁棒优化的方法相比,所提出的MARL算法在奖励、充电利用率公平性和供需公平性上分别提升了8.21%、8.29%和9.42%。