Highly automated assembly lines enable significant productivity gains in the manufacturing industry, particularly in mass production condition. Nonetheless, challenges persist in job scheduling for make-to-job and mass customization, necessitating further investigation to improve efficiency, reduce tardiness, promote safety and reliability. In this contribution, an advantage actor-critic based reinforcement learning method is proposed to address scheduling problems of distributed flexible assembly lines in a real-time manner. To enhance the performance, a more condensed environment representation approach is proposed, which is designed to work with the masks made by priority dispatching rules to generate fixed and advantageous action space. Moreover, a Monte-Carlo tree search based soft shielding component is developed to help address long-sequence dependent unsafe behaviors and monitor the risk of overdue scheduling. Finally, the proposed algorithm and its soft shielding component are validated in performance evaluation.
翻译:高度自动化的装配线在制造业中实现了显著的生产力提升,尤其是在大规模生产条件下。然而,在按订单生产和批量定制中,作业调度仍面临挑战,有必要进一步研究以提高效率、减少延迟、促进安全性与可靠性。本文提出一种基于优势演员-评论家的强化学习方法,用于实时解决分布式柔性装配线的调度问题。为提升性能,提出了一种更紧凑的环境表示方法,该方法旨在与优先调度规则生成的掩码配合使用,以生成固定且具有优势的动作空间。此外,开发了基于蒙特卡洛树搜索的软屏蔽组件,以帮助处理与长序列相关的非安全行为并监控超期调度的风险。最终,通过性能评估验证了所提算法及其软屏蔽组件的有效性。