In pursuit of a more sustainable and cost-efficient last mile, parcel lockers have gained a firm foothold in the parcel delivery landscape. To fully exploit their potential and simultaneously ensure customer satisfaction, successful management of the locker's limited capacity is crucial. This is challenging as future delivery requests and pickup times are stochastic from the provider's perspective. In response, we propose to dynamically control whether the locker is presented as an available delivery option to each incoming customer with the goal of maximizing the number of served requests weighted by their priority. Additionally, we take different compartment sizes into account, which entails a second type of decision as parcels scheduled for delivery must be allocated. We formalize the problem as an infinite-horizon sequential decision problem and find that exact methods are intractable due to the curses of dimensionality. In light of this, we develop a solution framework that orchestrates multiple algorithmic techniques rooted in Sequential Decision Analytics and Reinforcement Learning, namely cost function approximation and an offline trained parametric value function approximation together with a truncated online rollout. Our innovative approach to combine these techniques enables us to address the strong interrelations between the two decision types. As a general methodological contribution, we enhance the training of our value function approximation with a modified version of experience replay that enforces structure in the value function. Our computational study shows that our method outperforms a myopic benchmark by 13.7% and an industry-inspired policy by 12.6%.
翻译:为实现更可持续且成本效益更高的最后一公里配送,包裹柜已在快递配送领域占据稳固地位。为充分发挥其潜力并同时确保客户满意度,对包裹柜有限容量的有效管理至关重要。由于从服务提供商的角度看,未来的配送请求和取件时间具有随机性,这带来了挑战。为此,我们提出动态控制包裹柜是否作为可用配送选项呈现给每位新客户,目标是通过优先级加权最大化服务请求数量。此外,我们考虑了不同尺寸的储物格,这引入了第二类决策问题,因为计划配送的包裹必须进行空间分配。我们将该问题形式化为无限时域序贯决策问题,并发现精确方法因维度灾难而难以求解。鉴于此,我们开发了一个融合序列决策分析与强化学习中多种算法技术的解决方案框架,具体包括成本函数近似、离线训练的参数量化值函数近似以及截断在线滚动优化。我们创新性地结合这些技术,能够有效处理两类决策之间的强关联性。作为一般性方法论贡献,我们通过改进版经验回放机制增强值函数近似的训练过程,该机制可强化值函数的结构特性。计算研究表明,我们的方法相比短视基准策略性能提升13.7%,较行业启发式策略提升12.6%。