Animals often demonstrate a remarkable ability to adapt to their environments during their lifetime. They do so partly due to the evolution of morphological and neural structures. These structures capture features of environments shared between generations to bias and speed up lifetime learning. In this work, we propose a computational model for studying a mechanism that can enable such a process. We adopt a computational framework based on meta reinforcement learning as a model of the interplay between evolution and development. At the evolutionary scale, we evolve reservoirs, a family of recurrent neural networks that differ from conventional networks in that one optimizes not the synaptic weights, but hyperparameters controlling macro-level properties of the resulting network architecture. At the developmental scale, we employ these evolved reservoirs to facilitate the learning of a behavioral policy through Reinforcement Learning (RL). Within an RL agent, a reservoir encodes the environment state before providing it to an action policy. We evaluate our approach on several 2D and 3D simulated environments. Our results show that the evolution of reservoirs can improve the learning of diverse challenging tasks. We study in particular three hypotheses: the use of an architecture combining reservoirs and reinforcement learning could enable (1) solving tasks with partial observability, (2) generating oscillatory dynamics that facilitate the learning of locomotion tasks, and (3) facilitating the generalization of learned behaviors to new tasks unknown during the evolution phase.
翻译:动物在其生命周期中常展现出对环境的卓越适应能力,这在一定程度上归因于形态与神经结构的进化。这些结构捕捉了代际共享环境的特征,从而引导并加速终身学习过程。本研究提出了一种计算模型,用以探索能够实现此类过程的机制。我们采用基于元强化学习的计算框架,将进化与发展之间的相互作用建模。在进化尺度上,我们进化储层——一类区别于传统网络的递归神经网络,其优化目标并非突触权重,而是控制最终网络架构宏观属性的超参数。在发展尺度上,我们利用这些进化后的储层来促进通过强化学习进行行为策略的学习。在强化学习智能体中,储层在将环境状态传递给行为策略之前对其进行编码。我们在多个2D和3D模拟环境中评估了该方法,结果表明储层的进化能够提升多样化挑战性任务的学习能力。我们重点研究了三个假设:结合储层与强化学习的架构能够(1)解决部分可观测性任务,(2)生成有利于运动任务学习的振荡动态,以及(3)促进已学行为向进化阶段未知的新任务泛化。