Recent years have seen a rise in interest in terms of using machine learning, particularly reinforcement learning (RL), for production scheduling problems of varying degrees of complexity. The general approach is to break down the scheduling problem into a Markov Decision Process (MDP), whereupon a simulation implementing the MDP is used to train an RL agent. Since existing studies rely on (sometimes) complex simulations for which the code is unavailable, the experiments presented are hard, or, in the case of stochastic environments, impossible to reproduce accurately. Furthermore, there is a vast array of RL designs to choose from. To make RL methods widely applicable in production scheduling and work out their strength for the industry, the standardisation of model descriptions - both production setup and RL design - and validation scheme are a prerequisite. Our contribution is threefold: First, we standardize the description of production setups used in RL studies based on established nomenclature. Secondly, we classify RL design choices from existing publications. Lastly, we propose recommendations for a validation scheme focusing on reproducibility and sufficient benchmarking.
翻译:近年来,利用机器学习(特别是强化学习)解决不同复杂程度的生产调度问题的研究兴趣日益增长。通用方法是将调度问题分解为马尔可夫决策过程(MDP),然后通过实现该MDP的仿真环境训练强化学习智能体。由于现有研究依赖于代码不可获取的(有时是)复杂仿真系统,所展示的实验结果难以复现——在随机环境中甚至完全无法精确复现。此外,可供选择的强化学习设计方案众多。要使强化学习方法在工业领域普遍应用于生产调度并发挥其优势,必须建立生产设置描述(包含生产配置与强化学习设计)及验证方案的标准化体系。本文贡献有三:首先,基于现有命名规范,标准化了强化学习研究中生产设置的描述方式;其次,对现有文献中的强化学习设计方案进行了分类;最后,提出了聚焦可复现性和充分基准测试的验证方案建议。