The development of mobility-on-demand services, rich transportation data sources, and autonomous vehicles (AVs) creates significant opportunities for shared-use AV mobility services (SAMSs) to provide accessible and demand-responsive personal mobility. SAMS fleet operation involves multiple interrelated decisions, with a primary focus on efficiently fulfilling passenger ride requests with a high level of service quality. This paper focuses on improving the efficiency and service quality of a SAMS vehicle fleet via anticipatory repositioning of idle vehicles. The rebalancing problem is formulated as a Markov Decision Process, which we propose solving using an advantage actor critic (A2C) reinforcement learning-based method. The proposed approach learns a rebalancing policy that anticipates future demand and cooperates with an optimization-based assignment strategy. The approach allows for centralized repositioning decisions and can handle large vehicle fleets since the problem size does not change with the fleet size. Using New York City taxi data and an agent-based simulation tool, two versions of the A2C AV repositioning approach are tested. The first version, A2C-AVR(A), learns to anticipate future demand based on past observations, while the second, A2C-AVR(B), uses demand forecasts. The models are compared to an optimization-based rebalancing approach and show significant reduction in mean passenger waiting times, with a slightly increased percentage of empty fleet miles travelled. The experiments demonstrate the model's ability to anticipate future demand and its transferability to cases unseen at the training stage.
翻译:随需出行服务的发展、丰富的交通数据源以及自动驾驶车辆(AVs)为共享式自动驾驶出行服务(SAMSs)提供了重要机遇,使其能够提供可达且需求响应式的个人出行服务。SAMS车队运营涉及多项相互关联的决策,其首要目标是以高质量服务水平高效满足乘客出行请求。本文聚焦于通过空闲车辆的预测性重定位来提升SAMS车队的效率和服务质量。我们将该重平衡问题建模为马尔可夫决策过程,并提出采用基于优势演员-评论家(A2C)强化学习的方法进行求解。所提方法学习一种能够预测未来需求并与基于优化的分配策略相协同的重平衡策略。该方法支持集中式重定位决策,且由于问题规模不随车队规模变化,能够处理大型车队。利用纽约市出租车数据与基于智能体的仿真工具,我们测试了两个版本的A2C自动驾驶车辆重定位方法:第一版(A2C-AVR(A))基于历史观测学习预测未来需求,第二版(A2C-AVR(B))则利用需求预测。将模型与基于优化的重平衡方法进行比较,结果显示平均乘客等待时间显著降低,同时空驶里程占比略有增加。实验表明该模型具有预测未来需求的能力,并能迁移至训练阶段未见过的场景。