Neuroevolution (NE) has recently proven a competitive alternative to learning by gradient descent in reinforcement learning tasks. However, the majority of NE methods and associated simulation environments differ crucially from biological evolution: the environment is reset to initial conditions at the end of each generation, whereas natural environments are continuously modified by their inhabitants; agents reproduce based on their ability to maximize rewards within a population, while biological organisms reproduce and die based on internal physiological variables that depend on their resource consumption; simulation environments are primarily single-agent while the biological world is inherently multi-agent and evolves alongside the population. In this work we present a method for continuously evolving adaptive agents without any environment or population reset. The environment is a large grid world with complex spatiotemporal resource generation, containing many agents that are each controlled by an evolvable recurrent neural network and locally reproduce based on their internal physiology. The entire system is implemented in JAX, allowing very fast simulation on a GPU. We show that NE can operate in an ecologically-valid non-episodic multi-agent setting, finding sustainable collective foraging strategies in the presence of a complex interplay between ecological and evolutionary dynamics.
翻译:神经进化(NE)最近在强化学习任务中被证明是梯度下降学习的一种有竞争力的替代方法。然而,大多数NE方法及其相关仿真环境与生物进化存在关键差异:环境在每个世代结束时被重置为初始状态,而自然环境则由其居民持续修改;智能体根据其在种群内最大化奖励的能力进行繁殖,而生物有机体则根据其资源消耗所决定的内部生理变量进行繁殖和死亡;仿真环境主要是单智能体的,而生物世界本质上是多智能体的,并伴随种群共同进化。在这项工作中,我们提出了一种无需重置环境或种群即可连续进化自适应智能体的方法。该环境是一个大型网格世界,具有复杂的时空资源生成模式,包含多个智能体,每个智能体由一个可进化的循环神经网络控制,并根据其内部生理机制在本地繁殖。整个系统在JAX中实现,允许在GPU上实现极速仿真。我们证明,NE可以在生态有效、非间歇性的多智能体环境中运行,在生态与进化动力学复杂相互作用下发现可持续的集体觅食策略。