Reinforcement learning (RL) is increasingly adopted in job shop scheduling problems (JSSP). But RL for JSSP is usually done using a vectorized representation of machine features as the state space. It has three major problems: (1) the relationship between the machine units and the job sequence is not fully captured, (2) exponential increase in the size of the state space with increasing machines/jobs, and (3) the generalization of the agent to unseen scenarios. We present a novel framework - GraSP-RL, GRAph neural network-based Scheduler for Production planning problems using Reinforcement Learning. It represents JSSP as a graph and trains the RL agent using features extracted using a graph neural network (GNN). While the graph is itself in the non-euclidean space, the features extracted using the GNNs provide a rich encoding of the current production state in the euclidean space, which is then used by the RL agent to select the next job. Further, we cast the scheduling problem as a decentralized optimization problem in which the learning agent is assigned to all the production units and the agent learns asynchronously from the data collected on all the production units. The GraSP-RL is then applied to a complex injection molding production environment with 30 jobs and 4 machines. The task is to minimize the makespan of the production plan. The schedule planned by GraSP-RL is then compared and analyzed with a priority dispatch rule algorithm like first-in-first-out (FIFO) and metaheuristics like tabu search (TS) and genetic algorithm (GA). The proposed GraSP-RL outperforms the FIFO, TS, and GA for the trained task of planning 30 jobs in JSSP. We further test the generalization capability of the trained agent on two different problem classes: Open shop system (OSS) and Reactive JSSP (RJSSP) where our method produces results better than FIFO and comparable results to TS and GA.
翻译:强化学习在作业车间调度问题(JSSP)中的应用日益广泛。然而,基于强化学习的JSSP通常采用机器特征的向量化表示作为状态空间,存在三大问题:(1) 未能完全捕捉机器单元与作业序列间的关系;(2) 随着机器/作业数量增加,状态空间规模呈指数级增长;(3) 智能体对未见过场景的泛化能力不足。本文提出一种新颖框架——GraSP-RL(基于图神经网络的强化学习生产规划调度器)。该框架将JSSP表示为图结构,利用图神经网络(GNN)提取特征来训练强化学习智能体。尽管图本身处于非欧几里得空间,但通过GNN提取的特征可将当前生产状态编码为丰富的欧几里得空间表示,供强化学习智能体选择下一个作业。此外,我们将调度问题转化为分散式优化问题:为所有生产单元分配学习智能体,使其从各生产单元收集的数据中进行异步学习。随后将GraSP-RL应用于包含30个作业和4台机器的复杂注塑生产环境,以最小化生产计划的最大完工时间为目标。将GraSP-RL规划的调度方案与先入先出(FIFO)等优先级调度规则算法及禁忌搜索(TS)、遗传算法(GA)等元启发式算法进行对比分析。结果表明:在30个作业的JSSP训练任务中,所提出的GraSP-RL在性能上优于FIFO、TS和GA。我们进一步测试了训练后智能体在两类不同问题上的泛化能力——开放车间系统(OSS)与反应式JSSP(RJSSP),在该场景下本方法生成的结果优于FIFO,并与TS和GA的性能相当。