Neuroevolution (NE) has recently proven a competitive alternative to learning by gradient descent in reinforcement learning tasks. However, the majority of NE methods and associated simulation environments differ crucially from biological evolution: the environment is reset to initial conditions at the end of each generation, whereas natural environments are continuously modified by their inhabitants; agents reproduce based on their ability to maximize rewards within a population, while biological organisms reproduce and die based on internal physiological variables that depend on their resource consumption; simulation environments are primarily single-agent while the biological world is inherently multi-agent and evolves alongside the population. In this work we present a method for continuously evolving adaptive agents without any environment or population reset. The environment is a large grid world with complex spatiotemporal resource generation, containing many agents that are each controlled by an evolvable recurrent neural network and locally reproduce based on their internal physiology. The entire system is implemented in JAX, allowing very fast simulation on a GPU. We show that NE can operate in an ecologically-valid non-episodic multi-agent setting, finding sustainable collective foraging strategies in the presence of a complex interplay between ecological and evolutionary dynamics.
翻译:神经进化(NE)最近在强化学习任务中被证明是梯度下降学习的竞争性替代方案。然而,大多数NE方法及相关模拟环境与生物进化存在本质差异:每代结束时环境会重置为初始状态,而自然环境则由其居民持续改变;智能体基于种群内最大化奖励的能力进行繁殖,而生物体则根据自身资源消耗决定的内部生理变量进行繁殖与死亡;模拟环境主要为单智能体,而生物世界本质上是多智能体的,并且与种群共同进化。本研究提出一种无需环境或种群重置、可持续进化适应性智能体的方法。该环境是一个大型网格世界,具有复杂的时空资源生成机制,包含众多分别由可进化递归神经网络控制、并基于自身内部生理学进行局部繁殖的智能体。整个系统采用JAX实现,可在GPU上实现极快模拟。我们证明,神经进化能够运行于生态上有效的非断代多智能体场景中,在生态与进化动力学的复杂相互作用下,找到可持续的集体觅食策略。