Multi-agent reinforcement learning (MARL) studies crucial principles that are applicable to a variety of fields, including wireless networking and autonomous driving. We propose a photonic-based decision-making algorithm to address one of the most fundamental problems in MARL, called the competitive multi-armed bandit (CMAB) problem. Our numerical simulations demonstrate that chaotic oscillations and cluster synchronization of optically coupled lasers, along with our proposed decentralized coupling adjustment, efficiently balance exploration and exploitation while facilitating cooperative decision-making without explicitly sharing information among agents. Our study demonstrates how decentralized reinforcement learning can be achieved by exploiting complex physical processes controlled by simple algorithms.
翻译:多智能体强化学习(MARL)研究适用于无线网络和自动驾驶等多个领域的关键原理。我们提出了一种基于光子学的决策算法,以解决MARL中最基本的问题之一,即竞争性多臂老虎机(CMAB)问题。我们的数值模拟表明,光耦合激光器的混沌振荡和集群同步,结合我们提出的去中心化耦合调节,能够有效平衡探索与利用,并在无需智能体间显式共享信息的情况下促进协作决策。我们的研究展示了如何通过利用由简单算法控制的复杂物理过程来实现去中心化强化学习。