This paper proposes a policy-based deep reinforcement learning hyper-heuristic framework for solving the Job Shop Scheduling Problem. The hyper-heuristic agent learns to switch scheduling rules based on the system state dynamically. We extend the hyper-heuristic framework with two key mechanisms. First, action prefiltering restricts decision-making to feasible low-level actions, enabling low-level heuristics to be evaluated independently of environmental constraints and providing an unbiased assessment. Second, a commitment mechanism regulates the frequency of heuristic switching. We investigate the impact of different commitment strategies, from step-wise switching to full-episode commitment, on both training behavior and makespan. Additionally, we compare two action selection strategies at the policy level: deterministic greedy selection and stochastic sampling. Computational experiments on standard JSSP benchmarks demonstrate that the proposed approach outperforms traditional heuristics, metaheuristics, and recent neural network-based scheduling methods
翻译:本文提出一种基于策略的深度强化学习超启发式框架,用于求解作业车间调度问题。该超启发式智能体能够根据系统状态动态切换调度规则。我们通过两个关键机制扩展了该超启发式框架:首先,动作预过滤机制将决策限制在可行的底层动作空间内,使得底层启发式算法能够独立于环境约束进行评估,从而提供无偏估计;其次,承诺机制通过调节启发式规则切换频率来优化决策过程。我们系统研究了从逐步切换到全周期承诺等不同承诺策略对训练行为和完工时间的影响。此外,我们在策略层面比较了两种动作选择策略:确定性贪婪选择与随机采样策略。在标准JSSP基准测试集上的计算实验表明,所提方法在性能上优于传统启发式算法、元启发式算法以及近期基于神经网络的调度方法。