Applications of large-scale mobile multi-robot systems can be beneficial over monolithic robots because of higher potential for robustness and scalability. Developing controllers for multi-robot systems is challenging because the multitude of interactions is hard to anticipate and difficult to model. Automatic design using machine learning or evolutionary robotics seem to be options to avoid that challenge, but bring the challenge of designing reward or fitness functions. Generic reward and fitness functions seem unlikely to exist and task-specific rewards often have undesired side effects. Approaches of so-called innate motivation try to avoid the specific formulation of rewards and work instead with different drivers, such as curiosity. Our approach to innate motivation is to minimize surprise, which we implement by maximizing the accuracy of the swarm robot's sensor predictions using neuroevolution. A unique advantage of the swarm robot case is that swarm members populate the robot's environment and can trigger more active behaviors in a self-referential loop. We summarize our previous simulation-based results concerning behavioral diversity, robustness, scalability, and engineered self-organization, and put them into context. In several new studies, we analyze the influence of the optimizer's hyperparameters, the scalability of evolved behaviors, and the impact of realistic robot simulations. Finally, we present results using real robots that show how the reality gap can be bridged.
翻译:大规模移动多机器人系统的应用相比单体机器人更具优势,因其在鲁棒性和可扩展性方面潜力更高。开发多机器人系统的控制器极具挑战性,因为众多相互作用难以预测且难以建模。使用机器学习或进化机器人学的自动设计似乎是规避这一挑战的选择,但这又带来了设计奖励函数或适应度函数的难题。通用性奖励和适应度函数似乎难以存在,而任务特定的奖励常产生非预期副作用。所谓内在动机的方法试图避免显式定义奖励,转而利用诸如好奇心等不同驱动力。我们提出的内在动机方法是极小化惊喜,通过采用神经进化最大化集群机器人传感器预测的准确性来实现。集群机器人案例的一个独特优势在于,集群成员构成机器人所在环境中的实体,并能在自指循环中触发更主动的行为。我们总结了先前基于仿真的行为多样性、鲁棒性、可扩展性及工程化自组织研究成果,并将其置于相关背景下。在多项新研究中,我们分析了优化器超参数的影响、进化行为的可扩展性以及真实机器人仿真的影响。最后,我们展示了使用真实机器人的实验结果,揭示了如何弥合现实差距。