The scheduling of production resources (such as associating jobs to machines) plays a vital role for the manufacturing industry not only for saving energy but also for increasing the overall efficiency. Among the different job scheduling problems, the JSSP is addressed in this work. JSSP falls into the category of NP-hard COP, in which solving the problem through exhaustive search becomes unfeasible. Simple heuristics such as FIFO, LPT and metaheuristics such as Taboo search are often adopted to solve the problem by truncating the search space. The viability of the methods becomes inefficient for large problem sizes as it is either far from the optimum or time consuming. In recent years, the research towards using DRL to solve COP has gained interest and has shown promising results in terms of solution quality and computational efficiency. In this work, we provide an novel approach to solve the JSSP examining the objectives generalization and solution effectiveness using DRL. In particular, we employ the PPO algorithm that adopts the policy-gradient paradigm that is found to perform well in the constrained dispatching of jobs. We incorporated an OSM in the environment to achieve better generalized learning of the problem. The performance of the presented approach is analyzed in depth by using a set of available benchmark instances and comparing our results with the work of other groups.
翻译:生产资源的调度(如将作业与机器关联)在制造业中发挥着关键作用,既能节约能源,又能提升整体效率。本文针对不同作业调度问题中的JSSP进行研究。JSSP属于NP难组合优化问题,通过穷举搜索求解变得不可行。通常采用FIFO、LPT等简单启发式算法以及禁忌搜索等元启发式算法,通过压缩搜索空间来求解问题。然而,对于大规模问题,这些方法的可行性变得低效——要么解离最优解较远,要么计算耗时。近年来,利用深度强化学习求解组合优化问题的研究逐渐兴起,并在解质量和计算效率方面展现出良好前景。本文提出一种新颖方法,通过深度强化学习求解JSSP,重点考察了泛化能力和解有效性这两个目标。具体而言,我们采用基于策略梯度范式的PPO算法,该算法在受约束的作业调度中表现优异。我们在环境中引入订单交换机制,以实现对问题更优的泛化学习。通过使用一组公开基准实例,并与其它研究组的结果进行对比,深入分析了所提方法的性能。