Throughout long history, natural species have learned to survive by evolving their physical structures adaptive to the environment changes. In contrast, current reinforcement learning (RL) studies mainly focus on training an agent with a fixed morphology (e.g., skeletal structure and joint attributes) in a fixed environment, which can hardly generalize to changing environments or new tasks. In this paper, we optimize an RL agent and its morphology through ``morphology-environment co-evolution (MECE)'', in which the morphology keeps being updated to adapt to the changing environment, while the environment is modified progressively to bring new challenges and stimulate the improvement of the morphology. This leads to a curriculum to train generalizable RL, whose morphology and policy are optimized for different environments. Instead of hand-crafting the curriculum, we train two policies to automatically change the morphology and the environment. To this end, (1) we develop two novel and effective rewards for the two policies, which are solely based on the learning dynamics of the RL agent; (2) we design a scheduler to automatically determine when to change the environment and the morphology. In experiments on two classes of tasks, the morphology and RL policies trained via MECE exhibit significantly better generalization performance in unseen test environments than SOTA morphology optimization methods. Our ablation studies on the two MECE policies further show that the co-evolution between the morphology and environment is the key to the success.
翻译:纵观漫长历史,自然物种通过进化出适应环境变化的物理结构来学会生存。相比之下,当前强化学习(RL)研究主要关注在固定环境中训练具有固定形态(例如骨骼结构和关节属性)的智能体,这难以泛化至变化环境或新任务。本文通过“形态-环境协同进化(MECE)”来优化强化学习智能体及其形态:形态持续更新以适应变化的环境,同时环境逐步修改以带来新挑战并刺激形态改进。这形成了一种训练可泛化强化学习的课程,其形态和策略针对不同环境进行优化。我们无需手工设计课程,而是训练两个策略来自动改变形态与环境。为此:(1)我们为两个策略设计了两种新颖有效的奖励,其仅基于强化学习智能体的学习动态;(2)我们设计了一个调度器,自动决定何时改变环境与形态。在两类任务的实验中,通过MECE训练的形态与强化学习策略在未见测试环境中展现出显著优于现有最优形态优化方法的泛化性能。对MECE两大策略的消融研究进一步表明,形态与环境之间的协同进化是成功的关键。