Animals often demonstrate a remarkable ability to adapt to their environments during their lifetime. They do so partly due to the evolution of morphological and neural structures. These structures capture features of environments shared between generations to bias and speed up lifetime learning. In this work, we propose a computational model for studying a mechanism that can enable such a process. We adopt a computational framework based on meta reinforcement learning as a model of the interplay between evolution and development. At the evolutionary scale, we evolve reservoirs, a family of recurrent neural networks that differ from conventional networks in that one optimizes not the weight values but hyperparameters of the architecture: the later control macro-level properties, such as memory and dynamics. At the developmental scale, we employ these evolved reservoirs to facilitate the learning of a behavioral policy through Reinforcement Learning (RL). Within an RL agent, a reservoir encodes the environment state before providing it to an action policy. We evaluate our approach on several 2D and 3D simulated environments. Our results show that the evolution of reservoirs can improve the learning of diverse challenging tasks. We study in particular three hypotheses: the use of an architecture combining reservoirs and reinforcement learning could enable (1) solving tasks with partial observability, (2) generating oscillatory dynamics that facilitate the learning of locomotion tasks, and (3) facilitating the generalization of learned behaviors to new tasks unknown during the evolution phase.
翻译:动物通常在其生命周期中展现出卓越的环境适应能力,这在一定程度上得益于形态与神经结构的进化。这些结构捕捉了代际间共享的环境特征,从而引导并加速生命周期内的学习过程。本研究提出一种计算模型,用于探索实现该过程的潜在机制。我们采用基于元强化学习的计算框架,模拟进化与发育之间的相互作用。在进化尺度上,我们进化储层——一类与传统网络不同的循环神经网络,其优化对象并非权重值,而是架构的超参数:后者控制记忆、动力学等宏观特性。在发育尺度上,我们利用这些进化后的储层,通过强化学习促进行为策略的学习。在强化学习智能体中,储层在将环境状态传递给动作策略前对其进行编码。我们在多个2D和3D模拟环境中评估该方法,结果表明储层进化能够提升多种挑战性任务的学习效果。我们重点研究了三个假设:结合储层与强化学习的架构可能实现(1)解决部分可观测性任务,(2)生成振荡动力学以促进运动任务的学习,以及(3)促进所学行为向进化阶段未知的新任务泛化。