We study vehicle dispatching in autonomous mobility on demand (AMoD) systems, where a central operator assigns vehicles to customer requests or rejects these with the aim of maximizing its total profit. Recent approaches use multi-agent deep reinforcement learning (MADRL) to realize scalable yet performant algorithms, but train agents based on local rewards, which distorts the reward signal with respect to the system-wide profit, leading to lower performance. We therefore propose a novel global-rewards-based MADRL algorithm for vehicle dispatching in AMoD systems, which resolves so far existing goal conflicts between the trained agents and the operator by assigning rewards to agents leveraging a counterfactual baseline. Our algorithm shows statistically significant improvements across various settings on real-world data compared to state-of-the-art MADRL algorithms with local rewards. We further provide a structural analysis which shows that the utilization of global rewards can improve implicit vehicle balancing and demand forecasting abilities. Our code is available at https://github.com/tumBAIS/GR-MADRL-AMoD.
翻译:我们研究自主出行按需(AMoD)系统中的车辆调度问题,其中中央运营商将车辆分配给客户请求或拒绝这些请求,旨在最大化其总利润。近期方法采用多智能体深度强化学习(MADRL)来实现可扩展且高性能的算法,但这些方法基于局部奖励训练智能体,这扭曲了相对于系统范围利润的奖励信号,导致性能下降。因此,我们提出了一种新颖的基于全局奖励的MADRL算法,用于AMoD系统中的车辆调度,该算法通过反事实基线为智能体分配奖励,解决了训练智能体与运营商之间现有的目标冲突。与采用局部奖励的现有最优MADRL算法相比,我们的算法在真实世界数据的多种设置下展示了统计显著的改进。我们进一步提供了一项结构分析,表明全局奖励的利用能够改善隐式车辆平衡和需求预测能力。我们的代码可在https://github.com/tumBAIS/GR-MADRL-AMoD获取。