We start by discussing the link between ecosystem simulators and general AI. Then we present the open-source ecosystem simulator Ecotwin, which is based on the game engine Unity and operates on ecosystems containing inanimate objects like mountains and lakes, as well as organisms such as animals and plants. Animal cognition is modeled by integrating three separate networks: (i) a reflex network for hard-wired reflexes; (ii) a happiness network that maps sensory data such as oxygen, water, energy, and smells, to a scalar happiness value; and (iii) a policy network for selecting actions. The policy network is trained with reinforcement learning (RL), where the reward signal is defined as the happiness difference from one time step to the next. All organisms are capable of either sexual or asexual reproduction, and they die if they run out of critical resources. We report results from three studies with Ecotwin, in which natural phenomena emerge in the models without being hardwired. First, we study a terrestrial ecosystem with wolves, deer, and grass, in which a Lotka-Volterra style population dynamics emerges. Second, we study a marine ecosystem with phytoplankton, copepods, and krill, in which a diel vertical migration behavior emerges. Third, we study an ecosystem involving lethal dangers, in which certain agents that combine RL with reflexes outperform pure RL agents.
翻译:我们从讨论生态系统模拟器与通用人工智能之间的联系开始。接着介绍基于游戏引擎Unity的开源生态系统模拟器Ecotwin,该模拟器运行在包含无生命物体(如山峦、湖泊)以及动植物等生物的生态系统上。动物认知通过集成三个独立网络进行建模:(i)用于硬连线反射的反射网络;(ii)将氧气、水分、能量、气味等感官数据映射为标量幸福值的幸福网络;(iii)用于选择动作的策略网络。策略网络采用强化学习进行训练,其中奖励信号定义为相邻时间步间的幸福值差值。所有生物体具备有性或无性繁殖能力,当关键资源耗尽时便会死亡。我们报告了基于Ecotwin的三项研究成果,其中自然现象无需硬编码即可在模型中涌现。第一项研究模拟了包含狼、鹿和草的陆地生态系统,种群动态自发呈现Lotka-Volterra模式;第二项研究模拟了包含浮游植物、桡足类和磷虾的海洋生态系统,出现昼夜垂直迁移行为;第三项研究涉及致命危险的生态系统,其中采用强化学习与反射机制相结合的智能体表现优于纯强化学习智能体。