Motivated by humans' ability to adapt skills in the learning of new ones, this paper presents AdaptNet, an approach for modifying the latent space of existing policies to allow new behaviors to be quickly learned from like tasks in comparison to learning from scratch. Building on top of a given reinforcement learning controller, AdaptNet uses a two-tier hierarchy that augments the original state embedding to support modest changes in a behavior and further modifies the policy network layers to make more substantive changes. The technique is shown to be effective for adapting existing physics-based controllers to a wide range of new styles for locomotion, new task targets, changes in character morphology and extensive changes in environment. Furthermore, it exhibits significant increase in learning efficiency, as indicated by greatly reduced training times when compared to training from scratch or using other approaches that modify existing policies. Code is available at https://motion-lab.github.io/AdaptNet.
翻译:摘要:受人类在习得新技能时能够自适应调整已有能力的启发,本文提出AdaptNet——一种通过修改现有策略的潜在空间,使新行为能够从类似任务中快速学习(相较于从零开始学习)的方法。该方法基于给定的强化学习控制器,采用双层层次结构:一方面增强原始状态嵌入以实现行为的适度调整,另一方面修改策略网络层以进行更实质性的变化。实验表明,该技术能有效将现有基于物理的角色控制器适配至多种新动作风格、新任务目标、角色形态变化及环境剧变场景。此外,相比从零训练或使用其他修改现有策略的方法,AdaptNet展现出显著的学习效率提升(训练时间大幅减少)。代码开源地址:https://motion-lab.github.io/AdaptNet。