For Industry 4.0 Revolution, cooperative autonomous mobility systems are widely used based on multi-agent reinforcement learning (MARL). However, the MARL-based algorithms suffer from huge parameter utilization and convergence difficulties with many agents. To tackle these problems, a quantum MARL (QMARL) algorithm based on the concept of actor-critic network is proposed, which is beneficial in terms of scalability, to deal with the limitations in the noisy intermediate-scale quantum (NISQ) era. Additionally, our QMARL is also beneficial in terms of efficient parameter utilization and fast convergence due to quantum supremacy. Note that the reward in our QMARL is defined as task precision over computation time in multiple agents, thus, multi-agent cooperation can be realized. For further improvement, an additional technique for scalability is proposed, which is called projection value measure (PVM). Based on PVM, our proposed QMARL can achieve the highest reward, by reducing the action dimension into a logarithmic-scale. Finally, we can conclude that our proposed QMARL with PVM outperforms the other algorithms in terms of efficient parameter utilization, fast convergence, and scalability.
翻译:针对工业4.0革命,基于多智能体强化学习的协同自主移动系统得到广泛应用。然而,多智能体强化学习算法面临参数利用率巨大及收敛困难的问题。为解决上述挑战,本文提出一种基于演员-评论家网络架构的量子多智能体强化学习算法,该算法在可扩展性方面具有显著优势,能够应对含噪中等规模量子时代的限制。此外,得益于量子优越性,我们的量子多智能体强化学习在参数利用效率和快速收敛方面同样表现优异。需指出,算法中的奖励函数被定义为多智能体计算时间内任务精度,从而可实现多智能体协同。为进一步优化可扩展性,我们提出一种名为投影值测量的增强技术。基于投影值测量,所提出的量子多智能体强化学习将动作维度降至对数级规模,从而获得最高奖励。最终可得出结论:我们的量子多智能体强化学习结合投影值测量方法,在参数利用效率、快速收敛性和可扩展性方面均优于其他算法。