In this study, we propose a multitask reinforcement learning algorithm for foundational policy acquisition to generate novel motor skills. \textcolor{\hcolor}{Learning the rich representation of the multitask policy is a challenge in dynamic movement generation tasks because the policy needs to cope with changes in goals or environments with different reward functions or physical parameters. Inspired by human sensorimotor adaptation mechanisms, we developed the learning pipeline to construct the encoder-decoder networks and network selection to facilitate foundational policy acquisition under multiple situations. First, we compared the proposed method with previous multitask reinforcement learning methods in the standard multi-locomotion tasks. The results showed that the proposed approach outperformed the baseline methods. Then, we applied the proposed method to the ball heading task using a monopod robot model to evaluate skill generation performance. The results showed that the proposed method was able to adapt to novel target positions or inexperienced ball restitution coefficients but to acquire a foundational policy network, originally learned for heading motion, which can generate an entirely new overhead kicking skill.
翻译:本研究提出了一种用于基础策略获取的多任务强化学习算法,以生成新颖的运动技能。\textcolor{\hcolor}{在多任务策略中学习丰富的表示是动态运动生成任务中的一个挑战,因为策略需要应对目标变化或具有不同奖励函数或物理参数的环境变化。受人类感觉运动适应机制的启发,我们开发了学习流程来构建编码器-解码器网络及网络选择机制,以促进多种情境下的基础策略获取。首先,我们在标准多运动任务中将所提方法与先前的多任务强化学习方法进行了比较。结果表明,所提方法优于基线方法。随后,我们将所提方法应用于使用单足机器人模型的头球任务,以评估技能生成性能。结果显示,所提方法能够适应新的目标位置或不熟悉的球体恢复系数,并成功获取了最初为头球运动学习的基础策略网络,该网络能够生成全新的凌空抽射技能。