In dynamic motion generation tasks, including contact and collisions, small changes in policy parameters can lead to extremely different returns. For example, in soccer, the ball can fly in completely different directions with a similar heading motion by slightly changing the hitting position or the force applied to the ball or when the friction of the ball varies. However, it is difficult to imagine that completely different skills are needed for heading a ball in different directions. In this study, we proposed a multitask reinforcement learning algorithm for adapting a policy to implicit changes in goals or environments in a single motion category with different reward functions or physical parameters of the environment. We evaluated the proposed method on the ball heading task using a monopod robot model. The results showed that the proposed method can adapt to implicit changes in the goal positions or the coefficients of restitution of the ball, whereas the standard domain randomization approach cannot cope with different task settings.
翻译:在包含接触与碰撞的动态运动生成任务中,策略参数的微小变化可能导致截然不同的回报。例如在足球运动中,通过轻微改变击球位置、施加的力或球的摩擦系数,相似的头部击球动作可能使球飞向完全不同的方向。然而,很难想象朝不同方向顶球需要完全不同的技能。本研究提出了一种多任务强化学习算法,用于在具有不同奖励函数或环境物理参数的单类运动任务中,使策略自适应于目标或环境的隐式变化。我们采用单足机器人模型在顶球任务中评估了所提方法。结果表明,该方法能够适应目标位置或球恢复系数的隐式变化,而标准域随机化方法无法处理不同的任务设定。