Surge Routing: Event-informed Multiagent Reinforcement Learning for Autonomous Rideshare

Large events such as conferences, concerts and sports games, often cause surges in demand for ride services that are not captured in average demand patterns, posing unique challenges for routing algorithms. We propose a learning framework for an autonomous fleet of taxis that scrapes event data from the internet to predict and adapt to surges in demand and generates cooperative routing and pickup policies that service a higher number of requests than other routing protocols. We achieve this through a combination of (i) an event processing framework that scrapes the internet for event information and generates dense vector representations that can be used as input features for a neural network that predicts demand; (ii) a two neural network system that predicts hourly demand over the entire map, using these dense vector representations; (iii) a probabilistic approach that leverages locale occupancy schedules to map publicly available demand data over sectors to discretized street intersections; and finally, (iv) a scalable model-based reinforcement learning framework that uses the predicted demand over intersections to anticipate surges and route taxis using one-agent-at-a-time rollout with limited sampling certainty equivalence. We learn routing and pickup policies using real NYC ride share data for 2022 and information for more than 2000 events across 300 unique venues in Manhattan. We test our approach with a fleet of 100 taxis on a map with 38 different sectors (2235 street intersections). Our experimental results demonstrate that our method obtains routing policies that service $6$ more requests on average per minute (around $360$ more requests per hour) than other model-based RL frameworks and other classical algorithms in operations research when dealing with surge demand conditions.

翻译：大型活动（如会议、演唱会及体育赛事）常引发网约车服务需求的激增，此类激增模式无法通过平均需求模式捕捉，给路径规划算法带来独特挑战。我们提出一种面向自动驾驶出租车车队的智能学习框架，该框架通过从互联网抓取活动数据，预测并适应需求激增，生成协同路径规划与接单策略，相较于其他路径规划协议能处理更多服务请求。该框架通过以下四部分实现：（i）事件处理框架，从互联网抓取活动信息并生成稠密向量表征，作为神经网络预测需求的输入特征；（ii）双神经网络系统，利用稠密向量表征预测全图逐时需求；（iii）基于概率的方法，利用区域占用调度方案将公开的行业需求数据映射至离散街道交叉口；以及（iv）可扩展的基于模型强化学习框架，利用交叉口需求预测预判激增，采用单智能体逐步展开结合有限采样确定性等价策略进行出租车路径规划。我们使用2022年真实纽约网约车数据及曼哈顿300个独特场馆举办的2000余场活动信息训练路径规划与接单策略。在包含38个不同区域（2235个街道交叉口）的地图上以100辆出租车车队进行测试，实验结果表明：在应对激增需求条件时，本方法获得的路径规划策略相较其他基于模型的强化学习框架及运筹学经典算法，平均每分钟多处理6个服务请求（每小时约多处理360个请求）。