This work proposes a self-supervised training strategy designed for combinatorial problems. An obstacle in applying supervised paradigms to such problems is the need for costly target solutions often produced with exact solvers. Inspired by semi- and self-supervised learning, we show that generative models can be trained by sampling multiple solutions and using the best one according to the problem objective as a pseudo-label. In this way, we iteratively improve the model generation capability by relying only on its self-supervision, eliminating the need for optimality information. We validate this Self-Labeling Improvement Method (SLIM) on the Job Shop Scheduling (JSP), a complex combinatorial problem that is receiving much attention from the neural combinatorial community. We propose a generative model based on the well-known Pointer Network and train it with SLIM. Experiments on popular benchmarks demonstrate the potential of this approach as the resulting models outperform constructive heuristics and state-of-the-art learning proposals for the JSP. Lastly, we prove the robustness of SLIM to various parameters and its generality by applying it to the Traveling Salesman Problem.
翻译:本文提出了一种专为组合优化问题设计的自监督训练策略。将监督学习范式应用于此类问题的一个主要障碍是需要获取由精确求解器生成的昂贵目标解。受半监督与自监督学习的启发,我们证明可以通过采样多个解并根据问题目标函数选择最优解作为伪标签来训练生成模型。通过这种方式,我们仅依靠模型自身的监督信号迭代提升其生成能力,从而消除对最优性信息的依赖。我们在作业车间调度这一受到神经组合优化领域广泛关注的复杂组合问题上验证了这种自标注改进方法。我们基于经典的指针网络构建生成模型,并使用SLIM方法进行训练。在主流基准测试上的实验结果表明,该方法生成的模型性能优于传统构造启发式算法及当前最先进的JSP学习方案,展现了该方法的潜力。最后,我们通过将SLIM应用于旅行商问题,证明了该方法对不同参数的鲁棒性及其泛化能力。