Multi-Agent Reinforcement Learning (MARL) is a promising candidate for realizing efficient control of microscopic particles, of which micro-robots are a subset. However, the microscopic particles' environment presents unique challenges, such as Brownian motion at sufficiently small length-scales. In this work, we explore the role of temperature in the emergence and efficacy of strategies in MARL systems using particle-based Langevin molecular dynamics simulations as a realistic representation of micro-scale environments. To this end, we perform experiments on two different multi-agent tasks in microscopic environments at different temperatures, detecting the source of a concentration gradient and rotation of a rod. We find that at higher temperatures, the RL agents identify new strategies for achieving these tasks, highlighting the importance of understanding this regime and providing insight into optimal training strategies for bridging the generalization gap between simulation and reality. We also introduce a novel Python package for studying microscopic agents using reinforcement learning (RL) to accompany our results.
翻译:多智能体强化学习(MARL)是实现微观粒子(微机器人是其子类)高效控制的候选方案。然而,微观粒子环境具有独特挑战,例如在足够小的尺度上存在布朗运动。本研究利用基于粒子的朗之万分子动力学模拟作为微尺度环境的真实表征,探讨温度在MARL系统策略涌现与有效性中的作用。为此,我们在不同温度下的微观环境中开展两种多智能体任务的实验,即检测浓度梯度源和旋转杆。我们发现,在较高温度下,强化学习智能体识别出实现这些任务的新策略,这凸显了理解该温度区间的重要性,并为弥合仿真与现实之间泛化差距的最优训练策略提供了见解。此外,我们引入了一个用于通过强化学习研究微观智能体的新型Python软件包,以支撑我们的研究成果。