We study vehicle dispatching in autonomous mobility on demand (AMoD) systems, where a central operator assigns vehicles to customer requests or rejects these with the aim of maximizing its total profit. Recent approaches use multi-agent deep reinforcement learning (MADRL) to realize scalable yet performant algorithms, but train agents based on local rewards, which distorts the reward signal with respect to the system-wide profit, leading to lower performance. We therefore propose a novel global-rewards-based MADRL algorithm for vehicle dispatching in AMoD systems, which resolves so far existing goal conflicts between the trained agents and the operator by assigning rewards to agents leveraging a counterfactual baseline. Our algorithm shows statistically significant improvements across various settings on real-world data compared to state-of-the-art MADRL algorithms with local rewards. We further provide a structural analysis which shows that the utilization of global rewards can improve implicit vehicle balancing and demand forecasting abilities. Our code is available at https://github.com/tumBAIS/GR-MADRL-AMoD.
翻译:我们研究了自主移动按需(AMoD)系统中的车辆调度问题,其中中央运营商以最大化总利润为目标,将车辆分配给客户请求或拒绝这些请求。近期方法采用多智能体深度强化学习(MADRL)来实现可扩展且高性能的算法,但这些方法基于局部奖励训练智能体,导致奖励信号相对于系统整体利润出现失真,从而降低了性能。为此,我们提出一种基于全局奖励的新型MADRL算法,用于AMoD系统中的车辆调度,该算法通过利用反事实基线为智能体分配奖励,解决了训练智能体与运营商之间迄今存在的目标冲突。在真实世界数据上的多种场景中,我们的算法相比使用局部奖励的现有最先进MADRL算法展现出统计显著的性能提升。进一步的结构分析表明,全局奖励的利用能够提升隐式车辆平衡与需求预测能力。我们的代码已开源:https://github.com/tumBAIS/GR-MADRL-AMoD.