Swarm robots offer fascinating opportunities to perform complex tasks beyond the capabilities of individual machines. Just as a swarm of ants collectively moves a large object, similar functions can emerge within a group of robots through individual strategies based on local sensing. However, realizing collective functions with individually controlled microrobots is particularly challenging due to their micrometer size, large number of degrees of freedom, strong thermal noise relative to the propulsion speed, complex physical coupling between neighboring microrobots, and surface collisions. Here, we implement Multi-Agent Reinforcement Learning (MARL) to generate a control strategy for up to 200 microrobots whose motions are individually controlled by laser spots. During the learning process, we employ so-called counterfactual rewards that automatically assign credit to the individual microrobots, which allows for fast and unbiased training. With the help of this efficient reward scheme, swarm microrobots learn to collectively transport a large cargo object to an arbitrary position and orientation, similar to ant swarms. We demonstrate that this flexible and versatile swarm robotic system is robust to variations in group size, the presence of malfunctioning units, and environmental noise. Such control strategies can potentially enable complex and automated assembly of mobile micromachines, programmable drug delivery capsules, and other advanced lab-on-a-chip applications.
翻译:集群机器人展现出执行超越个体机器能力复杂任务的迷人前景。正如蚁群能够协同搬运大型物体,通过基于局部感知的个体策略,类似功能也能在机器人群体中涌现。然而,实现个体控制微机器人的集体功能面临特殊挑战:微米级尺寸、高自由度、相对于推进速度较强的热噪声、相邻微机器人间复杂的物理耦合以及表面碰撞。本研究采用多智能体强化学习(MARL)为最多200个由激光点独立控制运动的微机器人生成控制策略。在学习过程中,我们采用所谓的反事实奖励机制,该机制能自动为个体微机器人分配贡献度,从而实现快速且无偏的训练。借助这一高效奖励方案,集群微机器人学会了将大型货物对象集体运输至任意位置和方向,其行为模式类似于蚁群。我们证明这种灵活多功能的集群机器人系统对群体规模变化、故障单元存在以及环境噪声具有鲁棒性。此类控制策略有望实现移动微机器的复杂自动化组装、可编程药物递送胶囊及其他先进芯片实验室应用。