Motivated by humans' ability to adapt skills in the learning of new ones, this paper presents AdaptNet, an approach for modifying the latent space of existing policies to allow new behaviors to be quickly learned from like tasks in comparison to learning from scratch. Building on top of a given reinforcement learning controller, AdaptNet uses a two-tier hierarchy that augments the original state embedding to support modest changes in a behavior and further modifies the policy network layers to make more substantive changes. The technique is shown to be effective for adapting existing physics-based controllers to a wide range of new styles for locomotion, new task targets, changes in character morphology and extensive changes in environment. Furthermore, it exhibits significant increase in learning efficiency, as indicated by greatly reduced training times when compared to training from scratch or using other approaches that modify existing policies.
翻译:受人类在习得新技能时能自适应运用已有能力的启发,本文提出AdaptNet——一种通过修改已有策略的潜在空间,使系统能从相似任务中快速习得新行为(相较于从零开始学习)的方法。该方法基于给定的强化学习控制器,构建双层层级架构:首先对原始状态嵌入进行增强以支持行为的细微调整,再进一步修改策略网络层以实现更实质性的变化。实验表明,该技术能有效将现有物理控制器适配至多种新场景,包括多样化的运动风格、新任务目标、角色形态变化及环境的大幅改变。相较于从零训练或采用其他策略修改方法,该方案展现出显著的学习效率提升,具体表现为训练时间大幅缩短。