Cooperative air-ground delivery has emerged as a promising logistics paradigm by leveraging the complementary strengths of UAVs and ground carriers. However, effective dispatching in such heterogeneous systems faces two critical challenges: i) the heterogeneity between flight and road dynamics, ii) the scalability bottleneck raised by the exponential decision variables in large-scale fleets. To address these challenges, we propose HRL4AG, a Hierarchical Reinforcement Learning framework for cooperative Air-Ground delivery. Specifically, HRL4AG employs a high-level manager to tackle the scalability bottleneck by decomposing the joint action space, and mode-specific workers that encode distinct flight and road dynamics to address the heterogeneity. Furthermore, a novel internal reward mechanism is designed to guide the hierarchical policy learning, addressing the credit assignment problem in sparse-reward settings. Extensive experiments on two real-world datasets and an evaluation platform demonstrate that HRL4AG significantly outperforms state-of-the-art baselines, improving the delivery success rate by up to 26% while achieving an 80-fold increase in computational efficiency.
翻译:空地协同配送通过利用无人机与地面载具的互补优势,已成为一种前景广阔的物流范式。然而,在此类异构系统中实现高效调度面临两大关键挑战:i) 飞行与地面行驶动态特性的异质性;ii) 大规模车队中决策变量指数级增长带来的可扩展性瓶颈。为应对这些挑战,我们提出HRL4AG——一种用于空地协同配送的分层强化学习框架。具体而言,HRL4AG采用高层管理器通过分解联合动作空间来解决可扩展性瓶颈,并利用编码不同飞行与地面动态特性的模式专用工作者来处理异质性问题。此外,我们设计了一种新颖的内部奖励机制来指导分层策略学习,以解决稀疏奖励环境中的信用分配问题。在两个真实数据集和评估平台上进行的广泛实验表明,HRL4AG显著优于现有先进基线方法,在实现计算效率80倍提升的同时,将配送成功率最高提升26%。