This paper presents a comparison between two well-known deep Reinforcement Learning (RL) algorithms: Deep Q-Learning (DQN) and Proximal Policy Optimization (PPO) in a simulated production system. We utilize a Petri Net (PN)-based simulation environment, which was previously proposed in related work. The performance of the two algorithms is compared based on several evaluation metrics, including average percentage of correctly assembled and sorted products, average episode length, and percentage of successful episodes. The results show that PPO outperforms DQN in terms of all evaluation metrics. The study highlights the advantages of policy-based algorithms in problems with high-dimensional state and action spaces. The study contributes to the field of deep RL in context of production systems by providing insights into the effectiveness of different algorithms and their suitability for different tasks.
翻译:本文针对模拟生产系统中两种著名的深度强化学习算法——深度Q学习(DQN)与近端策略优化(PPO)进行了比较研究。我们采用了先前相关工作中提出的基于Petri网的仿真环境。通过多项评估指标对两种算法的性能进行对比,包括正确装配与分拣产品的平均百分比、平均回合长度以及成功回合占比。结果表明,PPO在所有评估指标上均优于DQN。本研究凸显了基于策略的算法在高维状态与动作空间问题中的优势,通过揭示不同算法的有效性及其对不同任务的适用性,为生产系统背景下的深度强化学习领域贡献了洞见。