This paper aims to develop a learning method for a special class of traveling salesman problems (TSP), namely, the pickup-and-delivery TSP (PDTSP), which finds the shortest tour along a sequence of one-to-one pickup-and-delivery nodes. One-to-one here means that the transported people or goods are associated with designated pairs of pickup and delivery nodes, in contrast to that indistinguishable goods can be delivered to any nodes. In PDTSP, precedence constraints need to be satisfied that each pickup node must be visited before its corresponding delivery node. Classic operations research (OR) algorithms for PDTSP are difficult to scale to large-sized problems. Recently, reinforcement learning (RL) has been applied to TSPs. The basic idea is to explore and evaluate visiting sequences in a solution space. However, this approach could be less computationally efficient, as it has to potentially evaluate many infeasible solutions of which precedence constraints are violated. To restrict solution search within a feasible space, we utilize operators that always map one feasible solution to another, without spending time exploring the infeasible solution space. Such operators are evaluated and selected as policies to solve PDTSPs in an RL framework. We make a comparison of our method and baselines, including classic OR algorithms and existing learning methods. Results show that our approach can find tours shorter than baselines.
翻译:本文旨在为旅行商问题(TSP)的一个特殊类别——即取送货TSP(PDTSP)——开发一种学习方法,该问题旨在寻找沿一对一的取送货节点序列的最短巡游路径。这里的“一对一”是指被运输的人员或货物与指定的取货和送货节点对相关联,这与可运送至任意节点的无差别货物形成对比。在PDTSP中,必须满足优先约束,即每个取货节点必须在其对应的送货节点之前被访问。针对PDTSP的经典运筹学(OR)算法难以扩展到大规模问题。近年来,强化学习(RL)已被应用于TSP,其基本思想是在解空间中探索和评估访问序列。然而,这种方法可能计算效率较低,因为它可能需要评估许多违反优先约束的不可行解。为了将解搜索限制在可行空间内,我们利用总是将一个可行解映射到另一可行解的算子,从而无需花费时间探索不可行解空间。这些算子在RL框架中被评估并选为策略,用于求解PDTSP。我们将我们的方法与基线(包括经典OR算法和现有学习方法)进行了比较。结果表明,我们的方法能够找到比基线更短的巡游路径。