This paper presents a novel learning approach for Dubins Traveling Salesman Problems(DTSP) with Neighborhood (DTSPN) to quickly produce a tour of a non-holonomic vehicle passing through neighborhoods of given task points. The method involves two learning phases: initially, a model-free reinforcement learning approach leverages privileged information to distill knowledge from expert trajectories generated by the LinKernighan heuristic (LKH) algorithm. Subsequently, a supervised learning phase trains an adaptation network to solve problems independently of privileged information. Before the first learning phase, a parameter initialization technique using the demonstration data was also devised to enhance training efficiency. The proposed learning method produces a solution about 50 times faster than LKH and substantially outperforms other imitation learning and RL with demonstration schemes, most of which fail to sense all the task points.
翻译:本文提出一种针对带邻域的杜宾斯旅行商问题的新型学习方法,能够快速生成非完整型车辆通过给定任务点邻域的路径。该方法包含两个学习阶段:首先,采用无模型强化学习方法利用特权信息,从Lin-Kernighan启发式算法生成的专家轨迹中提取知识;随后,通过监督学习阶段训练适应网络,使其能够在脱离特权信息的情况下独立求解问题。在第一学习阶段之前,本文还设计了一种利用示范数据的参数初始化技术以提升训练效率。所提方法生成解的速度比LKH算法快约50倍,并显著优于其他模仿学习与带示范的强化学习方案——后者大多无法感知全部任务点。