In this work, we propose a Self-Supervised training strategy specifically designed for combinatorial problems. One of the main obstacles in applying supervised paradigms to such problems is the requirement of expensive target solutions as ground-truth, often produced with costly exact solvers. Inspired by Semi- and Self-Supervised learning, we show that it is possible to easily train generative models by sampling multiple solutions and using the best one according to the problem objective as a pseudo-label. In this way, we iteratively improve the model generation capability by relying only on its self-supervision, completely removing the need for optimality information. We prove the effectiveness of this Self-Labeling strategy on the Job Shop Scheduling (JSP), a complex combinatorial problem that is receiving much attention from the Reinforcement Learning community. We propose a generative model based on the well-known Pointer Network and train it with our strategy. Experiments on popular benchmarks demonstrate the potential of this approach as the resulting models outperform constructive heuristics and current state-of-the-art learning proposals for the JSP.
翻译:本文提出了一种专门针对组合优化问题的自监督训练策略。将监督学习范式应用于此类问题的主要障碍之一是需要昂贵的精确求解器来生成作为真实标签的目标解。受半监督与自监督学习的启发,我们证明可以通过采样多个解并根据问题目标函数选择最优解作为伪标签,从而轻松训练生成模型。该方法仅依靠模型自身的监督信号迭代提升生成能力,完全消除了对最优解信息的需求。我们在作业车间调度这一备受强化学习领域关注的复杂组合问题上验证了自标注策略的有效性。我们基于经典的指针网络构建生成模型,并采用所提策略进行训练。在主流基准测试上的实验表明,该方法具有显著潜力,所得模型性能超越传统构造式启发式算法及当前最先进的JSP学习方案。