This paper presents a new method for solving an orienteering problem (OP) by breaking it down into two parts: a knapsack problem (KP) and a traveling salesman problem (TSP). A KP solver is responsible for picking nodes, while a TSP solver is responsible for designing the proper path and assisting the KP solver in judging constraint violations. To address constraints, we propose a dual-population coevolutionary algorithm (DPCA) as the KP solver, which simultaneously maintains both feasible and infeasible populations. A dynamic pointer network (DYPN) is introduced as the TSP solver, which takes city locations as inputs and immediately outputs a permutation of nodes. The model, which is trained by reinforcement learning, can capture both the structural and dynamic patterns of the given problem. The model can generalize to other instances with different scales and distributions. Experimental results show that the proposed algorithm can outperform conventional approaches in terms of training, inference, and generalization ability.
翻译:本文提出了一种解决定向问题(OP)的新方法,将其分解为两个子问题:背包问题(KP)和旅行商问题(TSP)。KP求解器负责节点选择,而TSP求解器负责设计最优路径并辅助KP求解器判断约束违反情况。为处理约束条件,我们提出了一种双种群协同演化算法(DPCA)作为KP求解器,该算法同时维护可行种群与不可行种群。我们引入动态指针网络(DYPN)作为TSP求解器,该网络以城市位置为输入,直接输出节点的排列。该模型通过强化学习训练,能够同时捕获给定问题的结构特征与动态模式,并具备向不同规模及分布的其他实例泛化的能力。实验结果表明,所提算法在训练、推理及泛化能力方面均优于传统方法。