Multi-Agent Reinforcement Learning (MARL) is a promising candidate for realizing efficient control of microscopic particles, of which micro-robots are a subset. However, the microscopic particles' environment presents unique challenges, such as Brownian motion at sufficiently small length-scales. In this work, we explore the role of temperature in the emergence and efficacy of strategies in MARL systems using particle-based Langevin molecular dynamics simulations as a realistic representation of micro-scale environments. To this end, we perform experiments on two different multi-agent tasks in microscopic environments at different temperatures, detecting the source of a concentration gradient and rotation of a rod. We find that at higher temperatures, the RL agents identify new strategies for achieving these tasks, highlighting the importance of understanding this regime and providing insight into optimal training strategies for bridging the generalization gap between simulation and reality. We also introduce a novel Python package for studying microscopic agents using reinforcement learning (RL) to accompany our results.
翻译:多智能体强化学习(MARL)是实现微观粒子(包括其中微机器人)高效控制的潜在候选方法。然而,微观粒子所处环境带来了独特挑战,例如足够小尺度下的布朗运动。在本研究中,我们利用基于粒子的朗之万分子动力学模拟作为微尺度环境的真实表征,探讨温度在MARL系统中策略涌现及其有效性的作用。为此,我们在不同温度下的微尺度环境中开展了两种多智能体任务的实验:检测浓度梯度源和旋转杆状物。我们发现,在较高温度下,强化学习智能体识别出实现这些任务的新策略,这凸显了理解该状态的重要性,并为优化训练策略以弥合仿真与现实之间泛化差距提供了洞见。此外,我们引入了一个新颖的Python软件包,用于结合强化学习研究微观智能体,以支持我们的实验结果。