The capacitated location-routing problems (CLRPs) are classical problems in combinatorial optimization, which require simultaneously making location and routing decisions. In CLRPs, the complex constraints and the intricate relationships between various decisions make the problem challenging to solve. With the emergence of deep reinforcement learning (DRL), it has been extensively applied to address the vehicle routing problem and its variants, while the research related to CLRPs still needs to be explored. In this paper, we propose the DRL with heterogeneous query (DRLHQ) to solve CLRP and open CLRP (OCLRP), respectively. We are the first to propose an end-to-end learning approach for CLRPs, following the encoder-decoder structure. In particular, we reformulate the CLRPs as a markov decision process tailored to various decisions, a general modeling framework that can be adapted to other DRL-based methods. To better handle the interdependency across location and routing decisions, we also introduce a novel heterogeneous querying attention mechanism designed to adapt dynamically to various decision-making stages. Experimental results on both synthetic and benchmark datasets demonstrate superior solution quality and better generalization performance of our proposed approach over representative traditional and DRL-based baselines in solving both CLRP and OCLRP.
翻译:容量约束选址-路径问题(CLRPs)是组合优化中的经典问题,需要同时进行选址与路径决策。在CLRPs中,复杂的约束条件以及各类决策间错综复杂的关系使得该问题求解极具挑战性。随着深度强化学习(DRL)的出现,其已被广泛应用于车辆路径问题及其变体的求解,而与CLRPs相关的研究仍有待探索。本文提出基于异质查询的深度强化学习方法(DRLHQ),分别用于求解CLRP及其开放版本(OCLRP)。我们首次提出遵循编码器-解码器结构的CLRPs端到端学习方法。具体而言,我们将CLRPs重新建模为针对多类决策定制的马尔可夫决策过程,该通用建模框架可适配于其他基于DRL的方法。为更好地处理选址与路径决策间的相互依赖关系,我们还引入了一种新颖的异质查询注意力机制,该机制能够动态适应不同的决策阶段。在合成数据集与基准数据集上的实验结果表明,在求解CLRP和OCLRP时,我们提出的方法在解的质量与泛化性能方面均优于具有代表性的传统方法与基于DRL的基线方法。