Enhancing Courier Scheduling in Crowdsourced Last-Mile Delivery through Dynamic Shift Extensions: A Deep Reinforcement Learning Approach

Crowdsourced delivery platforms face complex scheduling challenges to match couriers and customer orders. We consider two types of crowdsourced couriers, namely, committed and occasional couriers, each with different compensation schemes. Crowdsourced delivery platforms usually schedule committed courier shifts based on predicted demand. Therefore, platforms may devise an offline schedule for committed couriers before the planning period. However, due to the unpredictability of demand, there are instances where it becomes necessary to make online adjustments to the offline schedule. In this study, we focus on the problem of dynamically adjusting the offline schedule through shift extensions for committed couriers. This problem is modeled as a sequential decision process. The objective is to maximize platform profit by determining the shift extensions of couriers and the assignments of requests to couriers. To solve the model, a Deep Q-Network (DQN) learning approach is developed. Comparing this model with the baseline policy where no extensions are allowed demonstrates the benefits that platforms can gain from allowing shift extensions in terms of reward, reduced lost order costs, and lost requests. Additionally, sensitivity analysis showed that the total extension compensation increases in a nonlinear manner with the arrival rate of requests, and in a linear manner with the arrival rate of occasional couriers. On the compensation sensitivity, the results showed that the normal scenario exhibited the highest average number of shift extensions and, consequently, the fewest average number of lost requests. These findings serve as evidence of the successful learning of such dynamics by the DQN algorithm.

翻译：众包配送平台在匹配配送员与客户订单时面临复杂的调度挑战。本研究考虑两类众包配送员——承诺型与临时型配送员，二者采用不同的补偿机制。众包配送平台通常基于预测需求制定承诺型配送员的排班计划，因此在规划周期前可生成离线排班方案。然而，需求的不确定性导致在某些情境下需对离线方案进行在线调整。本文聚焦于通过班次延长动态调整承诺型配送员离线排班的问题，将其建模为序贯决策过程，目标是通过确定配送员班次延长策略与订单-配送员匹配方案最大化平台利润。为求解该模型，提出基于深度Q网络（DQN）的学习方法。与不允许班次延长的基线策略对比，本模型在收益、订单损失成本及未履约订单数方面均体现出优势。敏感性分析表明：总延长补偿成本随订单到达率呈非线性增长，随临时配送员到达率呈线性增长；在补偿敏感性方面，常规场景下的平均班次延长次数最高，相应的平均未履约订单数最少。这些结果验证了DQN算法对该类动态系统的有效学习能力。

相关内容