The scheduling of production resources (such as associating jobs to machines) plays a vital role for the manufacturing industry not only for saving energy but also for increasing the overall efficiency. Among the different job scheduling problems, the JSSP is addressed in this work. JSSP falls into the category of NP-hard COP, in which solving the problem through exhaustive search becomes unfeasible. Simple heuristics such as FIFO, LPT and metaheuristics such as Taboo search are often adopted to solve the problem by truncating the search space. The viability of the methods becomes inefficient for large problem sizes as it is either far from the optimum or time consuming. In recent years, the research towards using DRL to solve COP has gained interest and has shown promising results in terms of solution quality and computational efficiency. In this work, we provide an novel approach to solve the JSSP examining the objectives generalization and solution effectiveness using DRL. In particular, we employ the PPO algorithm that adopts the policy-gradient paradigm that is found to perform well in the constrained dispatching of jobs. We incorporated an OSM in the environment to achieve better generalized learning of the problem. The performance of the presented approach is analyzed in depth by using a set of available benchmark instances and comparing our results with the work of other groups.
翻译:生产资源的调度(如将作业与机器关联)在制造业中扮演着关键角色,不仅有助于节约能源,还能提升整体效率。在各种作业调度问题中,本文重点研究JSSP(作业车间调度问题)。JSSP属于NP难的组合优化问题,通过穷举搜索求解变得不可行。通常采用简单启发式算法(如FIFO、LPT)和元启发式算法(如禁忌搜索)通过缩减搜索空间来求解该问题。然而,对于大规模问题,这些方法的可行性变得低下,要么结果远非最优,要么耗时过长。近年来,利用深度强化学习求解组合优化问题的研究逐渐受到关注,并在解的质量与计算效率方面展现出令人期待的结果。本文提出一种新颖的JSSP求解方法,深入探究其使用深度强化学习时的泛化目标与解的有效性。具体而言,我们采用基于策略梯度范式的PPO算法,该算法在带约束的作业调度中表现优异。我们在环境中集成了订单交换机制,以实现对问题更优的泛化学习。通过采用一组公开基准实例,并与其他团队的研究结果进行对比,对所提方法的性能进行了深入分析。