Motivated by humans' ability to adapt skills in the learning of new ones, this paper presents AdaptNet, an approach for modifying the latent space of existing policies to allow new behaviors to be quickly learned from like tasks in comparison to learning from scratch. Building on top of a given reinforcement learning controller, AdaptNet uses a two-tier hierarchy that augments the original state embedding to support modest changes in a behavior and further modifies the policy network layers to make more substantive changes. The technique is shown to be effective for adapting existing physics-based controllers to a wide range of new styles for locomotion, new task targets, changes in character morphology and extensive changes in environment. Furthermore, it exhibits significant increase in learning efficiency, as indicated by greatly reduced training times when compared to training from scratch or using other approaches that modify existing policies. Code is available at https://motion-lab.github.io/AdaptNet.
翻译:受人类在学习新技能时能够自适应调整已有技能的启发,本文提出AdaptNet——一种通过修改现有策略的隐空间来快速学习相似任务新行为的方法,相较于从零开始学习效率显著提升。该方法基于给定的强化学习控制器,采用双层层级结构:首先增强原始状态嵌入以支持行为的适度调整,进而修改策略网络层以实现更实质性的变化。实验表明,该方法能有效将现有物理控制器适配至多种新行为:包括多样化步态风格、新任务目标、角色形态变化及环境的大幅度改变。相较于从零训练或采用其他修改现有策略的方法,该方法在训练效率上展现出显著提升(训练时间大幅缩短)。代码开源地址:https://motion-lab.github.io/AdaptNet。