The comprehension of how local interactions arise in global collective behavior is of utmost importance in both biological and physical research. Traditional agent-based models often rely on static rules that fail to capture the dynamic strategies of the biological world. Reinforcement learning has been proposed as a solution, but most previous methods adopt handcrafted reward functions that implicitly or explicitly encourage the emergence of swarming behaviors. In this study, we propose a minimal predator-prey coevolution framework based on mixed cooperative-competitive multiagent reinforcement learning, and adopt a reward function that is solely based on the fundamental survival pressure, that is, prey receive a reward of $-1$ if caught by predators while predators receive a reward of $+1$. Surprisingly, our analysis of this approach reveals an unexpectedly rich diversity of emergent behaviors for both prey and predators, including flocking and swirling behaviors for prey, as well as dispersion tactics, confusion, and marginal predation phenomena for predators. Overall, our study provides novel insights into the collective behavior of organisms and highlights the potential applications in swarm robotics.
翻译:理解局部相互作用如何在全局集体行为中涌现,对生物学和物理学研究至关重要。传统的基于智能体模型通常依赖静态规则,难以捕捉生物世界中的动态策略。强化学习已被提出作为一种解决方案,但以往多数方法采用人工设计的奖励函数,这些函数直接或间接地促进了群体行为的涌现。在本研究中,我们提出一个基于混合合作-竞争多智能体强化学习的极简捕食者-猎物共同进化框架,并采用一个仅基于基本生存压力的奖励函数:即猎物若被捕食者捕获则获得-1的奖励,捕食者则获得+1的奖励。令人惊讶的是,我们的分析揭示了这种框架下捕食者和猎物涌现出异常丰富的多样性行为,包括猎物的聚群和盘旋行为,以及捕食者的分散策略、迷惑和边缘捕食现象。总体而言,本研究为生物体的集体行为提供了新见解,并凸显了在群体机器人技术中的潜在应用。