The development of machine learning algorithms has been gathering relevance to address the increasing modelling complexity of manufacturing decision-making problems. Reinforcement learning is a methodology with great potential due to the reduced need for previous training data, i.e., the system learns along time with actual operation. This study focuses on the implementation of a reinforcement learning algorithm in an assembly problem of a given object, aiming to identify the effectiveness of the proposed approach in the optimisation of the assembly process time. A model-free Q-Learning algorithm is applied, considering the learning of a matrix of Q-values (Q-table) from the successive interactions with the environment to suggest an assembly sequence solution. This implementation explores three scenarios with increasing complexity so that the impact of the Q-Learning\textsc's parameters and rewards is assessed to improve the reinforcement learning agent performance. The optimisation approach achieved very promising results by learning the optimal assembly sequence 98.3% of the times.
翻译:机器学习算法的发展因应对制造决策问题日益增长的建模复杂性而备受关注。强化学习作为一种方法论,因减少对前期训练数据的需求(即系统随实际运行时间逐步学习)而具有巨大潜力。本研究聚焦于在特定对象的装配问题中实施强化学习算法,旨在评估所提方法在优化装配过程时间方面的有效性。应用了一种无模型Q-Learning算法,通过与环境连续交互学习Q值矩阵(Q表),从而提出装配序列解决方案。本实施探索了三种复杂度递增的场景,以评估Q-Learning参数及奖励对改进强化学习代理性能的影响。该优化方法通过在学习过程中以98.3%的概率掌握最优装配序列,取得了极具前景的结果。