On-demand ride-sharing platforms, such as Uber and Lyft, face the intricate real-time challenge of bundling and matching passengers-each with distinct origins and destinations-to available vehicles, all while navigating significant system uncertainties. Due to the extensive observation space arising from the large number of drivers and orders, order dispatching, though fundamentally a centralized task, is often addressed using Multi-Agent Reinforcement Learning (MARL). However, independent MARL methods fail to capture global information and exhibit poor cooperation among workers, while Centralized Training Decentralized Execution (CTDE) MARL methods suffer from the curse of dimensionality. To overcome these challenges, we propose Triple-BERT, a centralized Single Agent Reinforcement Learning (MARL) method designed specifically for large-scale order dispatching on ride-sharing platforms. Built on a variant TD3, our approach addresses the vast action space through an action decomposition strategy that breaks down the joint action probability into individual driver action probabilities. To handle the extensive observation space, we introduce a novel BERT-based network, where parameter reuse mitigates parameter growth as the number of drivers and orders increases, and the attention mechanism effectively captures the complex relationships among the large pool of driver and orders. We validate our method using a real-world ride-hailing dataset from Manhattan. Triple-BERT achieves approximately an 11.95% improvement over current state-of-the-art methods, with a 4.26% increase in served orders and a 22.25% reduction in pickup times. Our code, trained model parameters, and processed data are publicly available at the repository https://github.com/RS2002/Triple-BERT .
翻译:以Uber和Lyft为代表的按需出行平台,面临着将具有不同起讫点的乘客与可用车辆进行实时捆绑匹配的复杂挑战,同时还需应对显著的系统不确定性。由于司机与订单数量庞大导致观测空间巨大,订单调度虽本质上是集中式任务,却常采用多智能体强化学习(MARL)方法处理。然而,独立MARL方法无法捕获全局信息且智能体间协作性差,而集中训练分散执行(CTDE)的MARL方法则受维度灾难困扰。为克服这些挑战,我们提出Triple-BERT——一种专为网约车平台大规模订单调度设计的集中式单智能体强化学习方法。该方法基于TD3算法的变体,通过动作分解策略将联合动作概率分解为各司机的独立动作概率,从而应对巨大的动作空间。为处理庞大的观测空间,我们引入了一种基于BERT的新型网络结构:参数复用机制缓解了随司机与订单数量增长带来的参数量膨胀,而注意力机制则能有效捕捉大规模司机与订单池中的复杂关联。我们使用曼哈顿的真实网约车数据集验证了所提方法。Triple-BERT相较当前最优方法实现了约11.95%的性能提升,其中完成订单量增加4.26%,接驾时间降低22.25%。我们的代码、训练模型参数及处理数据已在https://github.com/RS2002/Triple-BERT 开源。