Solving job shop scheduling problems (JSSPs) with a fixed strategy, such as a priority dispatching rule, may yield satisfactory results for several problem instances but, nevertheless, insufficient results for others. From this single-strategy perspective finding a near optimal solution to a specific JSSP varies in difficulty even if the machine setup remains the same. A recent intensively researched and promising method to deal with difficulty variability is Deep Reinforcement Learning (DRL), which dynamically adjusts an agent's planning strategy in response to difficult instances not only during training, but also when applied to new situations. In this paper, we further improve DLR as an underlying method by actively incorporating the variability of difficulty within the same problem size into the design of the learning process. We base our approach on a state-of-the-art methodology that solves JSSP by means of DRL and graph neural network embeddings. Our work supplements the training routine of the agent by a curriculum learning strategy that ranks the problem instances shown during training by a new metric of problem instance difficulty. Our results show that certain curricula lead to significantly better performances of the DRL solutions. Agents trained on these curricula beat the top performance of those trained on randomly distributed training data, reaching 3.2% shorter average makespans.
翻译:采用固定策略(如优先调度规则)解决作业车间调度问题(JSSP)时,可能对某些问题实例取得满意结果,但对其他实例则效果不足。从单一策略视角来看,即使机器配置保持不变,为特定JSSP寻找近优解的难度也存在差异。近期被广泛研究且颇具前景的应对难度变异性的方法是深度强化学习(DRL),它不仅能动态调整智能体的规划策略以应对训练中的困难实例,也能在应用于新情境时进行自适应调整。本文通过将同一问题规模内的难度变异性主动纳入学习过程设计,进一步改进了作为底层方法的DRL。我们的方法基于采用DRL和图神经网络嵌入求解JSSP的先进技术框架。本研究通过课程学习策略对智能体的训练流程进行补充,该策略基于新提出的问题实例难度度量指标对训练过程中展示的实例进行排序。结果表明,特定课程设置能使DRL解决方案的性能显著提升。基于这些课程训练的智能体超越了随机分布训练数据下的最佳性能,平均完工时间缩短了3.2%。