Scheduling problems pose significant challenges in resource, industry, and operational management. This paper addresses the Unrelated Parallel Machine Scheduling Problem (UPMS) with setup times and resources using a Multi-Agent Reinforcement Learning (MARL) approach. The study introduces the Reinforcement Learning environment and conducts empirical analyses, comparing MARL with Single-Agent algorithms. The experiments employ various deep neural network policies for single- and Multi-Agent approaches. Results demonstrate the efficacy of the Maskable extension of the Proximal Policy Optimization (PPO) algorithm in Single-Agent scenarios and the Multi-Agent PPO algorithm in Multi-Agent setups. While Single-Agent algorithms perform adequately in reduced scenarios, Multi-Agent approaches reveal challenges in cooperative learning but a scalable capacity. This research contributes insights into applying MARL techniques to scheduling optimization, emphasizing the need for algorithmic sophistication balanced with scalability for intelligent scheduling solutions.
翻译:调度问题在资源、工业和运营管理中构成重大挑战。本文采用多智能体强化学习方法,研究了带准备时间和资源约束的无关并行机调度问题。本研究介绍了强化学习环境,并进行了实证分析,将多智能体算法与单智能体算法进行了比较。实验为单智能体和多智能体方法采用了多种深度神经网络策略。结果表明,在单智能体场景中,近端策略优化算法的可掩码扩展版本表现优异;在多智能体设置中,多智能体PPO算法效果显著。虽然单智能体算法在简化场景中表现良好,但多智能体方法在协作学习方面面临挑战,同时展现出可扩展的潜力。本研究为将多智能体强化学习技术应用于调度优化提供了见解,强调在智能调度解决方案中需在算法复杂性与可扩展性之间取得平衡。