In this work, we propose a Self-Supervised training strategy specifically designed for combinatorial problems. One of the main obstacles in applying supervised paradigms to such problems is the requirement of expensive target solutions as ground-truth, often produced with costly exact solvers. Inspired by Semi- and Self-Supervised learning, we show that it is possible to easily train generative models by sampling multiple solutions and using the best one according to the problem objective as a pseudo-label. In this way, we iteratively improve the model generation capability by relying only on its self-supervision, completely removing the need for optimality information. We prove the effectiveness of this Self-Labeling strategy on the Job Shop Scheduling (JSP), a complex combinatorial problem that is receiving much attention from the Reinforcement Learning community. We propose a generative model based on the well-known Pointer Network and train it with our strategy. Experiments on two popular benchmarks demonstrate the potential of this approach as the resulting models outperform constructive heuristics and current state-of-the-art Reinforcement Learning proposals.
翻译:在本文中,我们提出了一种专为组合优化问题设计的自监督训练策略。将监督范式应用于此类问题的主要障碍之一,是需要昂贵的精确求解器生成的目标解作为真实标签(ground-truth)。受半监督与自监督学习的启发,我们证明:通过采样多个解并以问题目标最优解作为伪标签,可以轻松训练生成式模型。通过这种方式,我们仅依赖模型自身的自监督能力迭代提升其生成性能,完全消除了对最优解信息的需求。我们针对作业车间调度问题(JSP)验证了该自标注策略的有效性——这是一个备受强化学习领域关注的复杂组合优化问题。我们基于经典的指针网络(Pointer Network)构建生成式模型,并采用所提策略进行训练。在两个主流基准上的实验表明,该方法具有显著潜力:生成的模型性能优于构造式启发算法及当前最先进的强化学习方案。