Autonomous Mobility-on-Demand (AMoD) systems are a rapidly evolving mode of transportation in which a centrally coordinated fleet of self-driving vehicles dynamically serves travel requests. The control of these systems is typically formulated as a large network optimization problem, and reinforcement learning (RL) has recently emerged as a promising approach to solve the open challenges in this space. However, current RL-based approaches exclusively focus on learning from online data, fundamentally ignoring the per-sample-cost of interactions within real-world transportation systems. To address these limitations, we propose to formalize the control of AMoD systems through the lens of offline reinforcement learning and learn effective control strategies via solely offline data, thus readily available to current mobility operators. We further investigate design decisions and provide experiments on real-world mobility systems showing how offline learning allows to recover AMoD control policies that (i) exhibit performance on par with online methods, (ii) drastically improve data efficiency, and (iii) completely eliminate the need for complex simulated environments. Crucially, this paper demonstrates that offline reinforcement learning is a promising paradigm for the application of RL-based solutions within economically-critical systems, such as mobility systems.
翻译:自主移动出行即服务(AMoD)系统是一种快速发展的交通模式,其中由中央协调的自动驾驶车队动态处理出行请求。这类系统的控制问题通常被建模为大规模网络优化问题,而强化学习(RL)近年来已成为解决该领域关键挑战的前沿方法。然而,现有基于强化学习的方法完全依赖在线数据学习,从根本上忽视了真实交通系统中交互行为所产生的单样本成本。为解决这些局限,本文提出从离线强化学习视角对AMoD系统控制问题进行形式化建模,并仅利用离线数据学习有效控制策略——这些数据对当前出行运营商而言已完全可得。我们进一步探讨了设计决策,并在真实交通系统上开展实验,结果表明:离线学习方法能够恢复AMoD控制策略,该策略(i)性能与在线方法持平,(ii)大幅提升数据效率,(iii)完全消除对复杂仿真环境的需求。关键的是,本文证明离线强化学习是面向经济关键系统(如交通系统)应用RL解决方案的极具前景的范式。